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1.  Introduction 


Agility  in  tactical  decision  making,  mission  management,  and  control  is  the  key  attribute 
for  enabling  heterogeneous  multi-unmanned  vehicle  (UxV)  teams  to  successfully  manage 
the  “fog  of  war”  with  its  inherent  complex,  ambiguous,  and  time-challenged  conditions. 
Mission  effectiveness  will  rely  on  rapid  identifieation  and  management  of  uneertainties 
that  can  disrupt  an  autonomous  team’s  ability  to  complete  complex  operations  safely.  As  a 
result,  many  of  today’s  operators  use  complex  human-machine  systems  on  a  daily  basis. 
Further,  as  operators  have  to  plan  and  direct  multiple  UxVs  simultaneously  to  achieve 
mission  objectives,  human  operators  are  often  not  able  to  maintain  efficient  and  effective 
performance  (Chen  and  Barnes  2012a).  These  decrements,  which  may  lead  to  both  mission 
failure  and  loss  of  life  and  property,  may  partially  stem  from  the  high  information  flow  rate 
required  to  supervise  multiple  UxVs  concurrently  (Paas  and  Merrienboer  1994).  Thus  it  is 
necessary  to  lower  the  cognitive  load  placed  on  the  operator  (Hwang  et  al.  2008)  by 
presenting  appropriate  information  when  needed  (Lyons  and  Havig  2014). 

To  decide  what  information  is  presented  to  the  operator,  intelligent  agents  (lAs)  have  been 
created  to  perform  the  role  of  an  intermediary  between  the  operator  and  each  individual 
unmanned  vehicle.  In  artificial  intelligence,  the  concept  of  an  agent  is  defined  as  anything 
that  has  the  ability  to  perceive  its  environment  through  sensors  and  act  upon  its 
environment  (Russel  and  Norvig  2009).  An  lA  is  an  agent  that  has  some  level  of  autonomy, 
meaning  it  can  act  with  limited  authority  from  others  and  is  responsible  for  reaching 
decisions  (Russel  and  Norvig  2009).  We  speeifically  use  the  term  lA  to  denote  a  software 
agent  that  is  incorporated  into  a  human  machine  system  for  the  purpose  of  shared  decision 
making  between  the  lA  and  the  system’s  operator  (e.g.,  Chen  et  al.  2014).  Thus,  instead  of 
manually  issuing  commands  to  each  UxV,  the  operator  acts  as  a  supervisor  reeeiving 
feedback  from  and  providing  instructions  to  an  lA  who  relays  these  commands  to  the  UxVs 
to  accomplish  their  shared  mission  (Chen  and  Barnes  2012b). 

In  this  design  the  human  operator  always  has  the  ultimate  decision  authority,  which  is  an 
example  of  mixed-initiative  decision  making  as  defined  by  Goodrich  (2010).  However, 
with  increasing  levels  of  autonomy,  human  operators  may  not  understand  the  information 
provided  by  the  lA  (i.e.,  generating  automated  plans)  due  to  the  operators’  difficulties 
understanding  the  rationale  behind  the  decision-making  processes  (Linegang  et  al.  2006). 
This  lack  of  understanding  may  lead  to  disuse  or  overreliance  of  the  system  (Parasuraman 
and  Riley  1997).  To  alleviate  this  problem  and  facilitate  fluid  mixed-initiative  decision 
making  between  the  human  operator  and  the  lA,  the  agent-user  interface  must  support 
optimal  transparency,  conveying  the  rationale  behind  its  recommendations  without 
burdening  the  operator  with  an  overwhelming  amount  of  data  (Lee  and  See  2004).  One 
approach  that  has  been  used  in  the  literature  with  success  has  been  the  Playbook 
architecture  (Miller  et  al.  2004).  In  this  technique  the  human  operator  acts  similarly  to  a 
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coach  of  a  sports  team  who  conveys  his  or  her  goals  and  directs  specific  behaviors  by 
calling  a  particular  play  (a  specified  eommand  that  eonveys  speeific  behaviors  to  be 
eompleted)  and  the  UxVs  act  as  the  “players”  who  autonomously  earry  out  the  instruetions 
eontained  in  the  play. 

1.1  Automation  Transparency 

The  3  most  eommon  ehallenges  for  humans  interaeting  with  highly  automated  systems  is 
understanding  the  eurrent  system  state,  comprehending  the  reasons  for  its  eurrent  behavior, 
and  projeeting  what  its  next  behavior  will  be  (Sarter  and  Woods  1995).  In  response  to  those 
3  eritieal  questions,  transparency  in  automated  systems  has  beeome  a  eritieal  researeh 
question.  Agent  transpareney  is  the  lA’s  ability  to  communieate  information  to  the  human 
operator  in  a  elear  and  effieient  manner,  whieh  allows  the  operator  to  develop  an  aeeurate 
mental  model  of  the  system  and  its  behavior  leading  to  ealibrated  trust  in  the  system  (Chen 
et  al.  2014;  Lee  and  See  2004). 

Previous  research  has  recommended  that  the  system  should  make  its  purpose,  proeess, 
performanee  (3Ps)  and  a  history  of  3Ps  available  to  the  operator  to  inerease  the  operator’s 
understanding  of  the  system  (Lee  and  See  2004).  Lee  and  See  stated  that  both  system 
eapabilities  and  limitations  should  also  be  shown  to  the  operator  to  assist  in  deeision 
making.  However,  to  reduee  operator  workload,  this  information  should  be  in  a  simplified 
form  to  limit  the  amount  of  proeessing  required  for  understanding  and  not  overwhelm  the 
operator  (Cook  and  Smallman  2008;  Neyedli  et  al.  2011).  Thus,  a  transparent  system 
should  maximize  operator  deeision-making  performanee  and  allow  the  operator  to 
maintain  overall  situation  awareness  (SA)  not  only  of  the  mission  environment,  but  also  of 
the  state  and  intent  of  the  system  themselves  (Chen  et  al.  2011;  Endsley  1995). 

The  SA-based  agent  transpareney  model  (SAT)  (Chen  et  al.  2014)  leveraged  this 
effeetiveness  requirement  and  developed  a  useful  theoretieal  framework  to  determine  what 
type  of  information  to  display  to  an  operator  to  maximize  their  situation  awareness  and 
assist  the  operator  in  developing  an  aeeurate  mental  model  ereating  ealibrated  system  trust. 
The  SAT  model  builds  upon  the  SA  theory  developed  by  Endsley  (1995),  the  beliefs, 
desires,  and  intention  agent  framework  (Rao  and  Georgeff,  1995),  the  3P  model  (Eee  and 
See  2004),  and  our  previous  work  (Chen  and  Barnes  2012a;  2012b)  (see  Eig.l). 
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SA-based  Agent  Transparency  (SAT)  Model 

I 


What’s  going  on  and  what  is 
the  agent  trying  to  achieve? 


Why  is  the  agent  doing  it? 


What  should  the  operator 
expect  to  happen? 


Level  1 


•  Purpose 

•  Desire  (Goal 
selection) 

•  Process 

•  Intentions 
(Planning/Execution) 

•  Progress 

•  Performance 


Level  2 


•  Reasoning  process 
(Belief)(Purpose) 

•  Environmental  & 
other  constraints 


Level  3 


■  Projection  to 
Future/End  State 
•  Potential  limitations 
•  Likelihood  of  error 
•History  of 
Performance 


Fig.  1  SA-based  Agent  Transparency  model  diagram  (Chen  et  al.  2014) 

In  the  first  level,  the  operator  is  presented  with  basie  information  about  the  state  of  the 
world  and  the  lA,  sueh  as  the  agent’s  eurrent  state  and  goals,  intentions,  and  proposed 
aetions.  The  seeond  level  builds  eonneetions  between  these  basie  pieees  of  rationale 
information  to  display  the  agent’s  eurrent  state  and  goals,  intentions,  and  reasoning  behind 
its  proposed  actions  Finally,  the  third  level  provides  the  operator  with  information 
regarding  the  projection  of  future  states  of  the  system,  such  as  the  predicted  consequences 
of  the  lA’s  decisions  and  any  uncertainties  associated  with  the  systems  actions  (Chen  et  al. 
2014).  Previous  research  has  supported  the  display  of  information  that  supports  agent 
transparency  to  the  operator  as  a  way  for  mitigating  uncertainties  regarding  a  system’s 
performance  (Lyons  and  Havig  2014).  Additionally,  displaying  a  system’s  reliability, 
which  has  led  to  human  operators  adapting  optimal  reliance  strategies  (Wang  et  al.  2009), 
is  similar  to  SAT  Level  3,  which  suggests  history  of  past  performance  can  support  optimal 
decision  making  (Chen  et  al.  2014).  The  benefits  of  including  SAT -based  information  in 
an  automated  system  are  further  supported  by  the  notion  that  humans  recalibrate  their  trust 
following  automation  failures  when  aware  of  system  limitations  (Dzindolet  et  al.  2003). 

1.2  Trust  in  Automation 

Proper  calibration  of  trust  is  critical  in  high-risk  situations,  such  as  military  operations 
(Groom  and  Nass  2007;  Lee  and  See  2004).  Over-reliance  on  an  automated  system  when 
it  is  not  appropriate  (automation  misuse)  can  lead  to  dangerous  consequences,  such  as  loss 
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of  life  and  property,  while  disusing  the  system  when  it  ean  provide  a  benefit  (automation 
disuse)  is  also  erroneous,  as  the  system  eould  provide  the  operator  with  lower  workload, 
faster  response  time,  or  greater  performanee  the  absence  of  which  may  be  costly  to  the 
overall  mission  (Parasuraman  and  Riley  1997).  Thus  it  is  important  for  the  operator  to 
develop  a  properly  calibrated  trust  in  the  system.  Calibrated  trust  means  that  the  operator 
has  an  accurate  mental  model  of  the  system  and  relies  on  the  system  within  the  system’s 
capabilities  and  is  cognizant  of  its  limitations,  which  leads  the  operator  to  override  the 
system  in  situations  outside  of  its  limitations  (Lee  and  See  2004). 

Recent  research  suggested  that  the  calibration  of  trust  depends  not  only  on  the  system’s 
reliability  but  also  on  the  perceived  workload  and  usability  (Hoff  and  Bashir  2015).  In  other 
words,  displays  that  have  more  information  to  support  transparency  will  be  rated  as  more 
usable  and  more  trustworthy  because  it  is  easier  for  the  operator  to  form  an  accurate  mental 
model  of  the  system’s  3Ps;  however,  more  information  does  not  always  equate  to  relevant 
and  good  information.  If  the  increased  information  processing  requirements  caused  by  the 
additional  information  shown  to  the  operator  increase  workload,  the  display  may  be  seen 
as  less  usable  and  will  be  trusted  less.  Therefore,  we  hypothesize  that  participant  ratings  of 
trust  will  increase  linearly  with  increases  in  transparency  level  as  the  information  displayed 
was  developed  based  on  the  SAT  model. 

1.3  Workload 

Another  concern  regarding  autonomous  systems  is  operator  workload,  which  is  the  cost  of 
performing  a  task  that  reduces  an  individual’s  ability  to  complete  additional  tasks  (Cain 
2007).  Increased  operator  workload  decreases  performance  and  SA,  and  leads  to  incorrect 
automation  usage  decisions  (Beck  et  al.  2007;  Chen  and  Barnes  2012b;  Parasuraman  and 
Riley  1997).  Operator  workload  is  also  a  concern,  as  it  may  increase  as  agent  transparency 
increases.  Chen  et  al.  (2014)  stated  that  to  support  increased  agent  transparency,  additional 
elements  must  be  added  to  the  interface;  Lyons  and  Havig  (2014)  further  stated  that  these 
additions  may  lead  the  operator  to  process  more  information,  increasing  workload. 
Conversely,  the  additional  interface  elements  inform  the  operator  of  the  current  state, 
rationale,  and  future  state  projections  so  that  the  operator  does  not  have  to  make  these 
connections  themselves,  which  may  decrease  their  workload  (Chen  et  al.  2011). 

Consequently,  the  effect  of  increased  agent  transparency  on  workload  is  unclear.  Thus 
workload  will  be  an  important  factor  in  the  current  experiment.  We  hypothesize  that 
workload  will  decrease  with  increased  transparency  level  because  the  design  of  the 
information  supporting  agent  transparency  in  the  system  is  designed  to  lower  operator 
cognitive  load.  However,  we  also  note  that  increased  workload  is  a  valid  concern  when 
additional  information  is  added  to  an  interface  and  increased  workload  may  decrease  both 
trust  in  the  system  and  perceived  usability. 
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1.4  Usability 

The  International  Organization  for  Standardization  (ISO)  defined  usability  as  a  user’s 
effeetiveness,  effieieney,  and  satisfaction  in  a  specific  task  context  (Bevan  2009;  ISO 
2008).  Previous  research  has  found  that  greater  usability  was  associated  with  more  trust  in 
automated  systems  (Wang  et  al.  2009)  and  calibrated  trust  while  using  automated  decision 
aides  (McBride  and  Morgan  2010).  Further,  displays  that  present  information  to  support 
agent  transparency  may  require  integrating  more  and  potentially  complex  information  to 
operators,  thus  usability  is  a  paramount  concern  when  designing  transparent  autonomous 
systems  (Beven  2009;  Scholtz  and  Consolvo  2004).  We  hypothesize  that  usability  scores 
will  increase  with  transparency  level  because  of  the  previously  hypothesized  decrease  in 
workload  and  increases  in  trust.  In  other  words,  the  system  will  be  perceived  as  more  usable 
as  it  provides  more  information  supporting  of  transparency  to  the  operator.  This  hypothesis 
is  also  based  on  previous  work  indicating  that  workload  decreases  the  effectiveness  of  a 
system  (Beven  and  Macleod  1994). 

1.5  Individual  Differences 

The  effects  of  individual  differences  (IDs)  on  operator  decision-making  performance, 
workload,  trust,  and  usability  were  evaluated  in  the  present  study.  Several  key  individual 
differences  were  identified  as  relevant:  perceived  attentional  control  (PAC),  spatial  ability, 
working  memory  capacity  (WMC),  and  gaming  experience  (GE). 

1.5.1  Perceived  Attentional  Control 

Attentional  control  refers  to  an  individual’s  ability  to  self-regulate  and  enact  effortful 
control  over  their  attentional  processes  (Derryberry  and  Reed  2002).  This  ability  assists 
individuals  in  determining  which  stimuli  in  the  environment  to  direct  their  attention  toward 
and  assists  in  switching  their  attention  between  tasks  (Astle  and  Scerif  2009).  PAC  is  an 
individual’s  self-report  of  their  ability  to  direct  effortful  control  over  their  attentional 
processes  (Derryberry  and  Reed  2002).  Individual  differences  in  PAC  have  been  evaluated 
in  previous  studies  involving  supervisory  control  of  multiple  UxVs  and  may  be  an 
important  predictor  of  performance  in  human-robotic  interactions  robot  tasks  (Chen  and 
Barnes  2012b;  Wright  et  al.  2013).  Therefore,  we  hypothesize  that  individuals  with 
increased  PAC  will  be  better  able  to  calibrate  their  trust  in  the  system  by  more  quickly 
being  able  to  determine  issues  with  the  lA.  These  individuals  may  also  rate  lower  workload 
across  all  transparency  levels. 

1.5.2  Spatial  Ability 

Spatial  abilities  are  another  potentially  important  variable  to  consider  explaining 
performance  differences  in  supervisory  multi-UxV  systems.  Previous  research  has  found 
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that  individuals  with  greater  spatial  abilities  not  only  made  fewer  performanee  errors  on  a 
robotie  navigation  task  (Lathan  and  Traeey  2002)  but  also  outperformed  less  spatially 
skilled  individuals  in  threat  detection  tasks  conducted  while  monitoring  robotic 
performance  (Chen  and  Barnes  2012a;  Chen  et  al.  2008),  in  vehicle  identification  tasks 
(Fincannon  et  al.  2013),  and  during  both  direct  line  of  sight  and  teleoperation  navigation 
performance  (Long  2011).  We  hypothesize  that  individuals  with  greater  spatial  abilities 
will  specifically  exhibit  greater  performance  when  given  less  information  where  spatial 
relations  among  assets  is  more  valuable  and  may  also  exhibit  lower  workload  during  the 
experiment. 

1.5.3  Working  Memory  Capacity 

Greater  WMC  has  been  found  to  be  associated  with  greater  multitasking  ability.  High 
scores  on  the  Operation  Span  (OSPAN),  a  measure  of  WMC,  have  been  linked  to  greater 
performance  on  UxV  tasks  (de  Visser  et  al.  2010).  Further,  when  the  demand  for  mental 
resources  such  as  working  memory  capacity  are  overloaded,  individuals  must  expend  more 
effort  and  performance  will  decrease  (Wickens  2008).  We  therefore  hypothesize  that 
individuals  with  greater  WMC  will  self-report  lower  workload  during  the  experiment  and 
will  have  faster  response  times  as  a  result  of  being  able  to  more  quickly  and  efficiently 
process  the  presented  information. 

1.5.4  Action  Gaming  Experience 

Greater  GE  has  previously  been  shown  to  increase  accuracy  and  SA  during  multitasking 
situations  (Chen  and  Barnes  2012b;  Cummings  et  al.  2010).  In  fact,  playing  video  games 
may  assist  individuals  to  develop  strategies  that  can  successfully  be  used  to  increase 
performance  on  other  tasks.  For  example  one  study  found  that  experienced  action  video 
game  players  (AVGPs),  i.e.,  individuals  who  play  action  games  such  as  first  person 
shooters,  outperformed  nongamers  in  a  change  blindness  task  because  they  employed  a 
broad  search  strategy.  The  nongamers  on  the  other  hand  employed  a  more  elaborate  and 
costly  strategy  that  cost  additional  time  (Clark  et  al.  2011).  We  hypothesize,  therefore,  that 
action  GE  will  be  associated  with  faster  reaction  times  throughout  the  experiment. 

1.6  Current  Study 

In  the  current  study  we  simulated  a  heterogeneous  multi-UxV  planning  task  where 
participants  took  on  the  role  of  an  UxV  operator  whose  job  was  to  supervise  vehicles  and 
direct  them  to  carry  out  missions  while  managing  the  commander’s  intent  plus  vehicle  and 
environmental  constraints.  Operators  managed  a  team  of  6  vehicles — 2  unmanned  aerial 
vehicles  (UAVs),  2  unmanned  ground  vehicles  (UGVs),  and  2  unmanned  surface  vehicles 
(USVs) — to  complete  3  blocks  of  8  experimental  events,  1  for  each  transparency  level,  for 
24  discrete  missions,  using  a  simulator  loosely  based  on  the  US  Air  Eorce  Research 
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Laboratory’s  (AFRL’s)  FUSION  system.  The  simulator  uses  a  similar  approach  as  the 
Playbook  system  developed  by  Miller  et  al.  (2004)  to  engage  the  participant  (called  the 
operator)  in  mixed-initiative  decision  making.  In  the  current  experiment  the  simulator 
provides  the  operator  with  a  particular  play  to  call;  however,  it  seeks  input  from  the  human 
by  suggesting  2  similar  plans  (Plans  A  and  B)  to  achieve  the  play.  Each  mission  began  with 
participants  monitoring  UxV  vehicle  positions  and  status.  During  this  time  they  would 
receive  4  messages,  each  containing  one  of  the  following;  patrol  reports,  updates  on  vehicle 
status,  or  commander  intent  messages.  Two  of  the  messages  were  relevant  to  the  operators’ 
task  while  2  were  irrelevant;  messages  were  presented  in  a  randomized  order.  Afterward, 
participants  were  given  an  objective  to  complete  (e.g.,  locate  a  missing  person  or  defuse  an 
improvised  explosive  device  (lED)  along  with  2  plans  (Plans  A  and  B)  that  may  achieve 
that  objective.  The  participants’  task  was  to  use  information  given  to  them  by  the  lA  to 
choose  the  best  plan  to  complete  each  mission. 

This  experiment  manipulated  interface  transparency  level  and  either  provided  operators 
with  a  SAT  Level  1  (basic  plan  information  only),  SAT  Level  1+2  (basic  plan  information 
and  reasoning),  or  SAT  Level  1+2+3  (basic  plan  information,  reasoning,  and  projections 
of  uncertainty)  interface.  Our  primary  goal  was  to  determine  how  increased  information 
supporting  of  agent  transparency  based  on  the  SAT  model  would  affect  operators’  trust  in 
the  lA,  workload,  and  their  perceived  usability  of  the  system.  Our  secondary  goal  was  to 
determine  how  individual  differences  affected  the  relationship  between  transparency  and 
trust,  as  well  as  workload  and  usability.  Finally,  we  are  also  interested  in  trying  to 
understand  participants’  decision-making  strategy  and  utilization  of  different  elements  of 
the  interface  to  determine  which  parts  of  the  display  were  useful  to  the  participants. 

2.  Method 


2.1  Participants 

Thirty-five  participants,  recruited  using  an  online  participant  pool,  completed  the 
experiment,  and  5  were  removed  from  the  study.  Two  participants  were  removed  due  to 
technical  issues,  2  because  they  did  not  pass  the  evaluation,  and  1  because  of  a  failed  color 
vision  test.  Overall,  30  young  adults  in  the  Orlando,  FL,  area  (18  men  and  12  women) 
between  the  ages  of  1 8  and  29  (M=  21 .23,  SD  =  2.33)  participated  in  this  study.  Participants 
were  compensated  $  15/hr  for  their  participation. 
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2.2  Apparatus 


2.2.1  Simulator 

A  customized  simulator,  based  on  the  AFRL’s  FUSION  multi-UxV  planning  system 
(Spriggs  et  al.  2014),  was  ereated  to  support  the  eurrent  study.  The  simulator  eonsisted  of 
a  standard  desktop  eomputer,  one  60.96-em  (24-ineh)  monitor,  one  standard  Windows 
keyboard,  one  standard  2-button  mouse,  2  desktop  speakers,  and  a  eustomized  software 
program.  The  simulator  ineluded  several  seetions:  a  video  window  where  partieipants 
watehed  UxV  movements  and  reeeived  intelligenee  (Intel)  messages,  a  mission  assignment 
window  where  partieipants  reeeived  a  mission  objeetive,  and  a  deeision  window  where 
partieipants  reeeived  an  asset  eapability  tile  to  inform  them  of  relevant  information — 
vehiele  eapabilities,  mission  synopsis,  Intel,  and  both  of  the  lAs  plan  suggestions  (Plans  A 
and  B). 

Partieipants  evaluated  the  2  plans  and  seleeted  the  best  plan  based  on  their  judgment. 
Additionally,  partieipants  were  instrueted  to  use  3  metries  to  evaluate  eaeh  plan;  Speed, 
Coverage,  and  Capabilities.  Speed  was  defined  as  how  quiekly  eaeh  vehiele  ean  arrive  or 
earry  out  the  mission.  Coverage  was  defined  as  how  well  the  vehiele  ean  get  “eyes  on 
target”  based  on  the  type  of  sensors  eaeh  vehiele  earned.  Finally,  Capabilities  was  defined 
as  the  vehiele’s  appropriateness  for  the  mission.  Eaeh  vehiele  had  a  set  of  strengths  and 
weaknesses  (e.g.,  ean  travel  long  distanees,  stealthy,  or  weaponized)  that  eould  affeet 
Capabilities. 

2. 2. 1.1  Simulator  Video  Window 

The  simulator  video  window  (Fig.  2)  showed  partieipants  a  base  map,  ineluding  eaeh 
normal  vehiele  patrol  route  (interior  road,  perimeter,  harbor,  and  sea  lanes).  The  map  was 
always  displayed  in  the  same  orientation  (north  was  always  up).  Loeations  of  a  garage  and 
doek  are  denoted  by  labeled  boxes.  These  loeations  house  vehieles  that  are  not  eurrently 
assigned  to  the  plan.  UxVs  were  labeled  using  the  middle  letter  of  the  appropriate  vehiele 
aeronym  (A  =  UAV,  G  =  UGV,  S  =  USV)  followed  by  a  vehiele  number  (e.g.,  UAVl). 
Vehiele  names  remained  eonstant  during  the  experiment.  Several  smaller  tiles  were 
overlaid  on  the  map,  ineluding  a  “play  detail”  tile  (top  left)  that  showed  the  play  name  and 
a  visual  representation  of  the  eurrent  play,  detailing  aetive  vehiele  movement  (eolored  to 
mateh  the  aetive  play).  In  the  video  window,  all  vehieles  began  performing  the  “normal 
full  eoverage  patrol  base  defense”  play  (always  blue).  An  Intel  history  tile  (bottom  left) 
displayed  previously  aeknowledged  Intel.  Messages  from  the  base  eommander  were 
prioritized  and  listed  separately.  Partieipants  were  given  a  seroll  bar  and  eould  seroll  if  too 
mueh  information  was  displayed  in  eaeh  box.  A  vehicle  status  tile  (eenter  right)  identified 
whieh  vehieles  were  in  use  (assigned  to  the  aetive  play),  in  reserve  (unassigned  to  play),  or 
out  of  serviee  (grounded).  Vehieles  were  displayed  in  one  of  2  eolors,  the  play’s  eolor  or 
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white  (reserve  or  out  of  serviee).  As  partieipants  watched  the  vehicles  patrol  the  base,  Intel 
messages  would  arrive  (Fig.  3),  which  froze  the  simulation  until  they  were  acknowledged 
by  clicking  the  Acknowledge  box. 


Fig.  2  Simulator  video  window  during  opening  video 


Fig.  3  Simulator  video  window  with  Intel  message 


2. 2. 1.2  Simulator  Mission  Assignment  Window 

The  simulator  mission  assignment  window  (Fig.  4)  was  composed  of  an  “alert”  pop-up  box 
that  described  the  participants’  mission  objective  (e.g.,  “There  is  a  ship  out  in  the  harbor 
near  the  North  Sea  Lane.  A  man  has  gone  overboard.  Send  the  best  vehicle(s)  to  coordinate 
the  search  for  the  man.”)  Participants  then  clicked  on  the  “Accept  Mission”  box,  which 
brought  up  the  decision  window. 
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Fig.  4  Simulator  video  window  showing  mission  objective  box 
2. 2. 1.3  Simulator  Decision  Window 

The  simulator  decision  window  (Fig.  5)  displays  the  asset  capabilities  (top  left),  the  mission 
objective  (upper  center  left),  the  Intel  messages  (lower  center  left),  the  decision  box 
(bottom  left),  an  overview  of  Plan  A  (top  right),  and  an  overview  of  Plan  B  (bottom  right). 


Fig.  5  Simulator  decision  window 


2.2.2  Eye  Tracker 

The  SMI  (SensoMotoric  Instruments;  Berlin,  Germany)  Remote  Eye-tracking  Device 
(SMI  RED)  was  used  to  collect  ocular  indices  to  measure  both  visual  attention  and 
workload.  The  SMI  RED  system  uses  an  infrared  camera-based  tracking  system  and  allows 
for  noncontact  operation.  The  SMI  RED  uses  a  camera  mounted  under  the  computer 
monitor  to  track  both  the  pupil  and  comeal  reflection  in  both  eyes.  Eye  movements  were 
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sampled  at  the  rate  of  60  Hz  (eaeh  eye),  which  were  logged  in  real  time  and  synchronized 
with  the  simulator. 

2.2.3  Survey  and  Tests 

2. 2. 3.1  Demographics 

A  demographics  questionnaire  (Appendix  A)  was  administered  at  the  beginning  of  the 
study.  This  survey  included  information  on  participants’  age,  gender,  education,  computer 
experience,  and  GE.  Participants  rating  of  computer  and  GE  was  rated  on  a  6-point  Eikert- 
type  scale  (never,  rarely,  every  few  months,  monthly,  weekly,  or  daily).  Erequent  video 
gamers  were  categorized  as  individuals  who  reported  playing  either  weekly  or  daily 
whereas  nongamers  were  individuals  who  selected  any  other  choice. 

2.23.2  Color  Vision  Screening 

Participants  were  given  a  screening  for  color  deficiencies  prior  to  participation  using 
Ishihara  color  plates.  Nine  PowerPoint  slides  were  shown  to  participants  and  only 
individuals  who  correctly  answered  at  least  8  out  of  9  were  included  in  the  study. 

2. 2. 3. 3  Trust  Measures 

We  measured  trust  in  2  different  ways  in  the  current  study.  Eirst,  we  measured  participants’ 
objective  decision-making  performance.  Second,  we  measured  participants’  self-reported 
perceived  trust  in  the  lA,  which  is  subjective  in  nature,  using  a  questionnaire  (Appendix 
B).  Objective  decision-making  performance  was  measured  by  participants’  reliance  or 
rejection  of  the  lA’s  prioritization  of  Plan  A.  The  lA  always  presented  Plan  A  as  its 
indicator  of  the  best  option,  and  participants’  were  given  a  choice  of  accepting  the  lA’s 
recommendation  of  Plan  A  (indicating  trust  in  the  agent)  or  choosing  the  alternative  option 
Plan  B,  which  indicated  distrust  in  the  lA.  Participants  made  their  decision  by  selecting 
one  of  the  plan  buttons  displayed  on  the  interface  (Pig.  5). 

Based  on  previous  trust  frameworks,  an  operator’s  appropriate  reliance  on  the  agent  called 
calibrated  trust  (Hancock  et  al.  2011;  Lee  and  See  2004)  or  appropriate  learned  trust  (Hoff 
and  Bashir  2015),  the  participant  exhibits  appropriate  trust  when  the  participant  chooses 
Plan  A  when  the  lA  is  correct,  and  Plan  B  when  it  is  not;  in  other  words,  the  ideal  state, 
from  a  signal  detection  theory  perspective,  is  when  a  participant  would  only  make  hits  and 
correct  rejections.  However,  the  participant  may  over-rely  on  the  system  and  thus  misuse 
it.  In  these  situations,  the  participant  would  demonstrate  high  trust,  but  that  trust  would  be 
associated  with  degraded  performance  (such  as  a  high  false  alarm  rate).  Pinally, 
participants  may  not  trust  the  system  and  disuse  it,  even  when  Plan  A  is  correct.  In  this 
situation,  trust  would  also  correlate  with  degraded  performance  (such  as  a  higher  miss  rate; 
Table  1). 
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Table  1  Operational  definitions  of  antomation  nsage  decisions  nsed  in  the  cnrrent  stndy 


Correct  Plan 

Automation 

Operator 

Usage 

SDT* 

A 

A 

A 

Proper  I A  Use 

Hit 

B 

A 

B 

Correct  lA  rejection 

Correct  rejection 

A 

A 

B 

lA  disuse 

Miss 

B 

A 

A 

lA  misuse 

False  alarm 

“  SDT  =  signal  detection  theory. 


Subjective  trust  was  measured  using  the  automation  trust  scale  developed  by  Jian  et  al. 
(2000).  However,  this  measure  does  not  account  for  the  types  of  automation  suggested 
previously  in  the  literature.  For  example,  one  seminal  article  (Parasuraman  et  al.  2000), 
proposed  that  each  stage  of  information  processing  can  be  automated:  1)  information 
acquisition  (i.e.,  sensory  processing),  2)  information  analysis  (i.e.,  perception),  3)  decision 
and  action  selection,  and  4)  action  implementation  (i.e.,  response  selection).  To  this  end, 
we  combined  the  types  of  automation  (Parasuraman  et  al.  2000)  with  the  Jian  et  al.  (2000) 
automation  trust  scale  by  asking  each  of  the  trust  questions  for  each  part  of  the  information 
processing  model.  The  current  study,  however,  only  manipulates  the  display  of  information 
already  gathered  (trust  during  information  analysis)  and  performs  decision  and  action 
selection.  Consequently,  we  have  only  analyzed  those  scales  and  excluded  information 
acquisition  and  action  implementation  from  the  current  study.  Each  item  was  scored  on  a 
7-item  Likert-type  scale  (1  =  not  at  all;  7  =  extremely). 

2. 2. 3. 4  Response  Time 

Response  time  is  defined  as  the  time  from  when  the  decision  window  first  appeared  to  the 
moment  when  the  participant  clicked  one  of  the  plan  decision  buttons  and  was  measured 
directly  from  the  simulation. 

2. 2. 3. 5  Workload  Measures 

We  measured  both  objective  and  subjective  workload.  Objective  workload  was  measured 
using  eye  tracking  measures  (e.g.,  fixation  duration  and  pupil  diameter).  Subjective 
workload  was  measured  using  the  National  Air  and  Space  Administration  Task  Load  Index 
(NASA-TLX)  (Hart  and  Staveland  1988;  see  Appendix  C),  a  self-report  questionnaire.  The 
NASA-TLX  measures  a  total  weighted  workload  score  based  on  6  subscales:  mental, 
physical,  and  temporal  demands,  as  well  as  effort  exerted,  self-performance  evaluation, 
and  frustration  felt  during  the  task.  Participants  rated  these  6  subscales  on  a  continuous 
scale  from  0  to  100,  where  lower  scores  on  the  scale  indicate  low  workload  and  higher 
scores  represent  higher  workload  from  that  factor.  Next,  participants  completed  1 5  pairwise 
comparisons  (each  scale  appears  5  times)  between  each  of  the  scale  dimensions  (e.g.,  effort 
vs.  performance).  Participants  were  instructed  to  pick  the  one  factor  that  contributed  more 
to  their  sense  of  workload  during  the  task.  To  determine  each  subscale’s  weighting,  the 
number  of  times  each  factor  was  chosen  is  divided  by  15  (number  of  total  comparisons). 
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Each  subscale  score  is  then  multiplied  by  the  weight  to  ealeulate  the  seale’s  weighted  seore. 
For  example,  if  the  Mental  Demand  seale  was  rated  in  the  middle  of  the  seale  as  a  seore  of 
50  and  then  ehosen  5  of  the  15  times  during  the  pairwise  eomparisons,  it  would  have  a 
weighting  of  0.33,  whieh  would  be  multiplied  by  its  seore  of  50;  thus,  this  faetor  would 
have  a  weighted  seore  of  16.50. 

2. 2. 3. 6  Eye  Tracking  Measures 

We  measured  2  oeular  indiees  of  workload:  fixation  duration  and  pupil  diameter.  Both 
measures  were  eaptured  only  during  the  deeision  window  to  determine  operator’s  workload 
during  the  deeision  proeess  while  they  were  interaeting  with  the  3  transpareney  eonditions. 
Both  indiees  were  averaged  over  the  duration  of  eaeh  deeision  for  eaeh  transpareney  level 
eondition.  Fixation  duration,  measured  in  milliseeonds,  is  the  time  between  saeeades,  when 
the  eye  is  relatively  still,  during  whieh  visual  information  is  proeessed  (Holmqvist  and 
Nystrom  2011).  Fonger  durations  have  been  found  to  be  assoeiated  with  inereased 
workload  and  eognitive  proeessing  (e.g.,  Yang  et  al.  2014).  Pupil  diameter,  measured  in 
millimeters,  is  the  size  of  the  pupil  measured  horizontally.  Fight  levels  were  held  eonstant 
throughout  the  experiment  as  ehanges  in  luminanee  ean  affeet  pupil  size.  Additionally, 
larger  pupil  diameters  are  assoeiated  with  inereased  arousal  and  workload  (Holmqvist 
et  al.  2011). 

2. 2. 3. 7  System  Usability  Scale 

The  System  Flsability  Seale  (SFIS)  (Brooke  1996)  is  a  10-question  seale  designed  to 
measure  users’  overall  feelings  of  usability  (effieieney,  effieaey,  and  satisfaetion)  with  the 
interfaee.  The  SFIS  is  seored  on  a  5-pomt  Fikert  seale  (1  =  strongly  disagree;  5  =  strongly 
agree)  with  half  of  the  seale  reverse-eoded  (Appendix  D). 

2. 2. 3. 8  Attentional  Control  Survey 

The  attentional  eontrol  survey  (Derryberry  and  Reed  2002)  is  a  21 -item  survey  seored  on 
a  4-point  Fikert-type  seale  (almost  never,  sometimes,  often,  or  always)  to  measure  foeused, 
seleetive,  and  divided  attentional  eontrol  (Appendix  E). 

2. 2. 3. 9  Spatial  Ability  Measures 

We  used  3  measures  of  spatial  ability.  The  first  test  was  the  Cube  Comparison  Test 
(Ekstrom  et  al.  1976;  see  Appendix  F).  This  test  displays  sets  of  eube  pairs  and  partieipants 
must  mentally  rotate  the  eubes  to  determine  if  they  are  the  same  eube  from  different 
orientations  or  different  eubes.  Seeond,  the  Spatial  Orientation  Test  (Appendix  G), 
modeled  after  the  Cardinal  Direetion  Test  (Gugerty  and  Brooks  2004),  evaluates 
partieipants’  reorientation  from  an  egoeentrie  view.  Partieipants  view  both  a  third-person 
view  of  a  plane  as  well  as  a  first-person  view  of  a  building.  Partieipants  have  to  determine 
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the  orientation  of  the  building  given  the  orientation  of  the  plane.  Finally,  we  used  the  Sense 
of  Direction  Scale  (Kato  and  Takeuchi  2003;  see  Appendix  H)  which  is  a  self-report 
17-item  survey,  measured  on  a  Likert  scale,  that  measures  2  facets  of  spatial  ability, 
memory  for  usual  spatial  behavior,  and  direction  of  orientation.  These  surveys  measured  2 
related  but  distinct  components  of  spatial  ability:  spatial  visualization  (SpaV),  which  is  the 
mental  rotation  of  objects,  and  spatial  orientation  (SpaO),  which  is  the  reorientation  of  an 
environment  (Hegarty  and  Waller  2004). 

2.2.3.10  Working  Memory  Capacity 

We  used  a  version  of  the  OSPAN  task  (Conway  et  al.  2005)  to  measure  WMC.  Participants 
alternated  between  solving  a  math  problem  in  which  they  were  instructed  to  press  the  space 
bar  if  the  value  of  the  equation  equaled  zero  and  being  presented  with  a  word.  Sequence 
length  was  computer  adaptive  and  increased  with  correct  answers.  After  completing  a 
sequence,  participants  were  asked  to  recall  the  first  letter  of  each  word  in  the  order  of 
presentation.  The  average  letters  correct  per  sequence  was  used  as  our  measure  of  WMC. 

2.2.3.11  Personal  Involvement 

We  created  a  novel  measure  of  personal  involvement  (Appendix  I)  based  on 
Zaichkowsky’s  (1985)  Involvement  with  Advertisements  Measure,  which  consisted  of  6 
questions  scored  on  a  7-point  Likert  type  scale  (1  =  not  at  all;  7  =  extremely).  Personal 
involvement  in  the  task  or  task  engagement  was  used  as  a  potential  covariate  in  the  current 
study. 

2.2.3.12  Structured  Strategy  Interview 

To  determine  participant’s  decision-making  process  after  each  block,  we  asked  them  to 
assign  a  numerical  value  to  each  element  of  the  interface  on  a  7-point  Likert-type  scale 
(1  =  not  at  all;  7  =  extremely)  and  complete  a  series  of  qualitative,  open-ended  questions 
that  asked  them  to  describe  the  strategy  they  used  during  the  previous  block  of  trials 
(Appendix  J).  The  experimenter  took  notes  during  the  interview  and  completed  a  strategy 
sheet  that  was  coded  thematically. 

2.3  Experimental  Design 

The  experiment  was  a  within-subjects  design  with  3  levels  of  lA  transparency  (SAT  Level 
1,  SAT  Level  1+2,  and  SAT  Level  1+2+3)  based  on  the  SAT  model  (Chen  et  al.  2014). 
Transparency  level  was  counterbalanced  using  a  Latin  square  block  design  (Williams 
1949).  Participants  completed  3  separate  blocks,  each  consisting  of  8  mission  decisions  of 
a  single  transparency  level.  The  lA  was  incorrect  3  times,  yielding  a  reliability  rate  of 
62.5%  based  on  Wickens  and  Dixon’s  (2007)  finding  that  a  70%  reliability  rate  is  the  point 
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at  which  unreliable  automation  was  worse  than  a  laek  of  automation  in  terms  of 
performance. 

2.3.1  Transparency  SAT  Level  1 

Transparency  SAT  Level  1  (Fig.  6)  provided  partieipants  only  with  basie  plan  information. 
Partieipants  were  given  the  plan  detail  tile  (top  left),  whieh  displayed  the  play  ieon  and 
play  name,  an  informative  bar  at  the  bottom  the  sereen  that  displayed  a  1-  to  2-sentenoe 
summary  of  the  current  plan,  and  the  vehicle  status  tile  (eenter  right).  Additionally,  the  map 
displayed  the  eurrent  status  of  the  vehieles,  their  loeation  and  projeeted  paths  (represented 
as  dashed  lines),  and  areas  of  interest  (e.g.,  targets,  boats,  and  seareh  areas). 


Fig.  6  Plan  showing  transparency  SAT  Level  1  condition 


2.3.2  Transparency  SAT  Level  1+2 

Transparency  SAT  Level  1+2  (Fig.  7)  provided  partieipants  with  all  of  the  SAT  Level  1 
content  and  information  regarding  the  lA’s  rationale.  Partieipants  were  given  the  plan 
quality  ieon  (sprocket);  a  text  box  describing  factors  that  influenced  the  lA’s 
recommendation  of  Plan  A — speed,  coverage,  capabilities,  environment  alerts;  and  the 
lA’s  judgment  of  vehicle  appropriateness  to  the  mission. 
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rThisplun  was  suggested  bccuuse: 


Sprocket 


Speed 

A1  will  arrive  (astpr  than  A2  since  it 

has  a  more  direct  tlir.ln  [taih _ 

CapahiUHes 

A1  is  more  capabie  when  weairhing 
iop objects  than  when  searching  for 

_ 

Coverage  j 

Ailoqtiutecuvenigrwill  be  uciiU'ved  ' 
by  At  for  the  mission.  I 


A1  sector  turch  on  Mendty  b 


Fig.  7  Plan  showing  transparency  SAT  Level  1+2  condition.  Labels  indicate  the  location  of  the 
sprocket,  text  table,  and  a  potential  environmental  constraint. 

The  sprocket  had  several  parts,  each  displaying  2  types  of  information.  First,  the  wedge 
size  displayed  the  lA’s  judgment  of  the  importance  of  each  plan’s  evaluation  metrics 
(larger  wedge  =  higher  importance).  Second,  each  metric  was  colored  either  green  (good) 
or  yellow  (average),  based  on  the  lA’s  determination  of  likelihood  of  mission  success.  A 
text  box,  displaying  a  written  description  of  the  plan’s  speed,  coverage,  or  capability,  was 
presented  to  participants  underneath  the  sprocket.  Speed  was  defined  operationally  as  how 
quickly  the  vehicles  can  arrive  to  begin  and  complete  the  mission.  Coverage  was  defined 
as  the  quality  of  sensor  coverage  provided  during  the  mission.  Capability  is  defined  by 
specific  strengths  and  weaknesses  based  on  the  specific  equipment  of  each  vehicle 
(displayed  in  the  asset  capability  tile).  Mission  appropriateness  of  each  vehicle  was 
displayed  to  participants  by  manipulating  vehicle  icon  size.  In  SAT  Level  1+2,  vehicle 
icons  could  be  either  smaller  or  larger.  If  larger,  it  was  rated  as  most  appropriate  for  the 
mission.  Finally,  environmental  constraints  that  the  lA  used  to  consider  its  plan  rationale 
were  displayed  on  the  map  using  a  unique  icon. 

2.3.3  Transparency  SAT  Level  1+2+3 

Transparency  SAT  Level  1+2+3  (Fig.  8)  provided  participants  with  all  of  the  SAT  Level 
1+2  content  and  added  projections  of  uncertainty  to  the  interface.  Three  different  types  of 
projections  of  uncertainty  were  provided  though  the  interface;  1)  plan  metric  uncertainty 
(speed,  coverage,  and  capabilities),  shown  as  a  transparent  sprocket  wedge  and  a  bulleted 
statement  in  the  text  table,  2)  vehicle  uncertainty,  shown  as  a  transparent  vehicle  icon,  and 
3)  route  uncertainty,  shown  as  a  transparent  vehicle  route.  Participants  were  not  shown 
probabilities  or  likelihood  comparisons;  rather,  just  that  the  information  was  uncertain. 
Plan  metric  uncertainty  was  used  to  display  a  specific  uncertainty  about  speed,  coverage. 
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or  capability.  For  example,  in  Fig.  8,  speed  is  green,  meaning  this  metric  is  well  satisfied 
by  this  plan;  however,  it  is  also  uncertain.  The  specific  reason  is  listed  in  the  text  box,  as 
the  current  environmental  condition  may  slow  vehicle  A2  down,  redueing  speed.  The 
vehicle  was  uncertain  because  this  condition  may  cause  the  vehicle  to  become  less  suitable 
for  the  mission,  and  the  route  was  uncertain  because  it  may  have  variability  do  to  this  same 
environmental  factor. 


Plan  A 


This  plan  was  suggested  because: 
Speed 

A1  will  arrive  Taster  than  A2  since  it 
has  a  more  direct  flight  path. 

>  it  is  uncertain  how  fog  will  affect 

stJi-fd. _ 

Capabilities 

At  is  more  capable  when  searching 
for  objects  than  when  searching  for 
people. 

>  it  is  uncertain  if  A1  will  be 

capable  in  fog. _ 

Coverage 

Adequate  coverage  will  be  achieved 
byAl  for  the  mission. 


A1  sector  search  on  fnendly  boat. 


Fig.  8  Plan  showing  transparency  SAT  Level  1+2+3  condition 

2.4  Procedure 

After  participants  completed  the  informed  consent  and  were  given  a  brief  overview  of  the 
study,  they  completed  a  demographics  questionnaire  and  a  brief  color-vision  screening. 
Next,  participants  received  experimenter-guided  training  that  explained  the  tasks  and 
knowledge  needed  to  complete  the  study,  including  the  interface.  The  training  consisted  of 
PowerPoint  slides,  9  training  missions  (3  for  each  transparency  level),  and  feedback 
performed  using  the  simulator.  The  slides  informed  participants  that  the  lA  was  not  always 
100%  accurate  but  was  reliable.  The  training  session  lasted  approximately  45  min.  After 
each  training  block,  participants  completed  a  brief  structured  interview  about  the  strategy 
they  used.  Following  training,  participants  received  18  evaluation  missions.  During  the 
evaluation,  participants  were  required  to  select  12  or  more  missions  correctly  to  move  onto 
the  experimental  missions.  The  evaluation  lasted  approximately  40  min.  Participants  were 
given  a  5 -min  break,  after  which  the  eye  tracker  was  calibrated,  a  process  that  consisted  of 
the  participants  following  a  cursor  around  the  screen  using  their  eyes.  Once  the  eye  tracker 
was  calibrated,  participants  moved  on  to  the  experimental  missions. 

Participants  completed  3  blocks  of  experimental  missions,  one  for  each  transparency  level. 
Each  mission  was  divided  into  3  phases.  The  first  phase  consisted  of  participants  viewing 
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a  team  of  UxVs  (UGVs,  UAVs,  and  USVs)  as  they  patrolled  the  base  perimeter  for  45  s  at 
the  beginning  of  eaeh  mission.  During  this  base  defense  task,  participants  received  4  Intel 
messages  from  either  different  patrols  or  the  base  commander,  2  of  which  had  relevant 
Intel  for  the  upcoming  mission.  Intel  order  was  randomized.  When  an  Intel  message 
appeared,  the  simulation  would  pause  to  allow  the  participants  to  read  and  acknowledge 
each  one  individually.  One  message  appeared  every  9  s  during  the  simulation,  and  the 
mission  briefing  appeared  9  s  after  the  final  Intel  message  at  45  s.  During  phase  2,  occurring 
after  the  45 -s  observation  task,  participants  received  a  mission  briefing  with  a  specific 
objective.  Participants  were  required  to  read  and  acknowledge  this  briefing.  After 
acknowledging  the  mission  objective,  the  participant  entered  the  final,  decision  phase  of 
the  mission.  The  participant  was  presented  with  the  decision  window  (Fig.  5),  and  the 
intelligence  agent  recommended  2  plans;  Plan  A  (the  agent’s  top  choice)  and  Plan  B  (the 
agent’s  back-up  choice).  Participants  chose  between  the  2  plans,  and  the  next  mission 
would  begin.  Participants  completed  3  blocks  of  8  events  (1  for  each  transparency  level), 
which  were  counterbalanced.  Plan  A  was  correct  5  times  in  each  8-mission  block.  After 
each  block,  participants  completed  the  NASA-TLX  trust  survey,  the  personal  involvement 
survey,  the  verbal  strategy  questionnaire,  and  the  SUS.  The  experimental  session  lasted 
approximately  90  min,  and  the  entire  experiment  lasted  approximately  4  h. 

3.  Results 


We  present  the  results  from  a  series  of  analyses  of  variance  (ANOVAs)  and  multivariate 
analyses  of  variance  (MANOVAs)  across  all  dependent  variables  of  interest:  objective 
trust,  subjective  trust,  response  time,  workload,  and  system  usability.  We  also  conducted  a 
series  of  mixed  ANOVAs  on  all  of  the  individual  differences  variables  for  both  objective 
trust  and  workload  data  across  all  transparency  levels.  We  report  results  for  SpaO,  SpaV, 
WMC,  and  action  GE  conducted  by  completing  a  median  split  on  each  individual 
difference  factor.  Mixed  ANOVAs  were  conducted  because  we  used  both  a  within  subjects 
variable  (transparency  information)  and  between  subjects  variables  for  individual 
differences  metrics  (e.g.,  spatial  ability  and  working  memory). 

All  post  hoc  comparisons  used  a  Bonferroni  correction.  Prior  to  all  analyses,  we  screened 
for  outliers  and  assumptions  of  multivariate  normality  with  no  significant  deviations  noted. 
We  report  effect  sizes  in  terms  of  r\^  instead  of  partial  r\^,  as  these  can  more  easily  be 
converted  to  and  compared  across  studies  (Levine  and  Hullett  2002). 

3.1  Objective  Trust 

We  report  2  separate  analyses  of  objective  trust.  First,  we  conducted  a  signal  detection 
theory  analysis  using  the  raw  hit  and  false  alarm  data.  Second,  we  calculated  the  proportion 
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of  lA  proper  usage  rates  and  lA  proper  disuse  rates  to  determine  the  effeet  of  inereasing 
transpareney  on  ealibrated  trust. 

3.1.1  Signal  Detection  Analysis 

We  used  signal  deteetion  theory  (SDT)  to  analyze  partieipants’  sensitivity  to  the  lA’s 
aeeuraey.  We  eomputed  2  indiees  of  pereeptual  sensitivity  from  the  hit  (proper  lA  usage) 
and  false  alarm  (lA  misuses)  data.  The  first  index  we  used  was  the  parametrie  index  d'  and 
the  seeond  index  is  the  nonparametrie  P  (A),  whieh  is  an  estimate  of  the  area  under  the 
Reeeiver  Operating  Charaeteristie  curve  described  by  a  single  hit  and  false  alarm  pair,  also 
referred  to  as  A'  (Pollack  and  Norman  1964).  The  main  advantage  of  using  A'  is  that 
corrections  do  not  have  to  be  used  in  cases  with  hit  rates  of  “1.0”  or  false  alarm  rates  of 
“0”  (e.g.,  Craig  1979;  Davies  and  Parasuraman  1982);  therefore,  we  report  both  metrics. 
For  d'  in  cases  of  hit  rates  of  “1.0”  or  false  alarm  rates  of  “0”  we  employed  a  correction  to 
the  data  described  by  Macmillan  and  Creelman  (2004),  subtracting  half  of  a  hit  and  adding 
half  of  a  false  alarm  to  the  data.  In  addition  to  perceptual  sensitivity,  we  also  calculated  a 
measure  of  participants’  response  bias  p. 

3. 1.1.1  Perceptual  sensitivity  (d') 

The  results  of  a  repeated-measures  ANOVA  on  d'  showed  a  significant  transparency  level 
effect,  F  (2,58)  =  11.39,  p  <  0.001,  =  0.28,  where  d'  scores  linearly  increased  with 

transparency  information  (Fig.  9).  The  greatest  d'  scores  were  found  for  transparency  SAT 
Level  1+2+3  {M  =  2.75,  SD  =  1.01),  subsequently  decreasing  for  SAT  Level  1+2 
(M  =  2.16  SD  =  1.15)  and  SAT  Level  1  (M  =  1.55,  SD  =  1.05).  Post  hoc  tests  using 
Bonferroni  alpha  adjustments  within  SPSS  software  (referred  hereafter  as  post  hoc  tests) 
indicated  a  significant  difference  between  SAT  Level  1  and  SAT  Level  1+2+3  (p  <  0.001) 
and  a  marginal  difference  between  SAT  Level  1  and  SAT  Level  1+2  {p  =  0.06). 

3. 1.1. 2  Perceptual  sensitivity  (A') 

The  results  of  a  repeated-measures  ANOVA  on  A'  showed  a  significant  effect  of 
transparency  level,  F  (2,58)  =  7.54, p  =  0.001,  r\^=  0.21.  ^4'  scores  linearly  increased  with 
transparency  information.  The  greatest  A'  scores  were  found  for  SAT  Level  1+2+3 
(M  =  0.92,  SD  =  0.07).  Subsequently  decreasing  for  transparency  SAT  Level  1+2 
{M=  0.87,  SD  =  0.14)  and  SAT  Level  1  (M=  0.81,  SD  =  0.16).  Post  hoc  tests  indicated  a 
significant  difference  between  SAT  Level  1  and  SAT  Level  1+2+3  (p  =  0.002).  Fig.  9 
shows  both  d'  and  A '  across  the  3  transparency  levels.  Means  and  standard  error  for  both  d' 
and  A '  are  shown  in  Fig.  9,  while  means  and  standard  deviations  are  shown  in  Table  2. 
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Level  1  Level  1  +  2  Level  1+2  +  3 
Transparency  Level 


Fig.  9  Average  d'  and  A '  across  transparency  levels.  Error  bars  indicate  standard  error  of  the  mean 

(SEM). 


Table  2  Perceptnal  sensitivity  and  response  bias  data  across  transparency  levels 


Dependent  Variable 

Transparency 

M(SD) 

95%  CP 

SAT  Level  1 

1.55  (1.05) 

[1.16;  1.94] 

Perceptual  sensitivity  {d') 

SAT  Level  1+2 

2.16(1.15) 

[1.73;  2.58] 

SAT  Level  1+2+3 

2.75  (1.01) 

[2.37;  3.13] 

SAT  Level  1 

0.81  (0.16) 

[0.75;  0.87] 

Perceptual  sensitivity  {A  ) 

SAT  Level  1+2 

0.87(0.14) 

[0.82;  0.92] 

SAT  Level  1+2+3 

0.92  (0.07) 

[0.90;  0.95] 

SAT  Level  1 

-0.23  (0.89) 

[-0.10;  0.57] 

Response  bias  (P) 

SAT  Level  1+2 

0.68  (1.10) 

[0.27;  1.09] 

SAT  Level  1+2+3 

0.06  (1.53) 

[-0.51;  0.63] 

Cl  =  confidence  interval 


3. 1.1. 3  Response  bias  (|3) 

Response  bias  was  calculated  using  the  likelihood  ratio  p.  Results  revealed  no  significant 
differences  between  participants’  response  bias  in  SAT  Level  1  (M=  -0.23,  SD  =  0.89), 
SAT  Level  1+2  (M  =  0.68,  SD  =  1.10),  or  SAT  Level  1+2+3  (M  =  0.06,  SD  =  1.53), 
F  (2,58)  =  2.5\,p  =  0.090,  v^=  0.080.  Overall,  in  our  3  transparency  levels,  participants 
were  more  likely  to  follow  the  lA’s  recommendation  than  to  reject  it,  which  may  be  due  to 
our  reliability  manipulation.  Response  bias  and  sensitivity  measures  are  shown  in  Table  2. 

3.1.2  Proper  lA  Usage  and  Correct  lA  Rejection 

Proper  lA  use  and  correction  rejection  rates  represent  a  proportion  of  the  8  possible  cases 
during  each  block.  A  repeated-measures  MANOVA  on  both  proper  lA  use  and  correct  lA 
rejection  rates  across  each  transparency  level  was  used  to  reduce  pairwise  error  rate  since 
both  measures  were  moderately  correlated  (r’s  =  0.26  -  0.73),  but  not  so  strongly  correlated 
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that  warranted  creating  a  composite  measure.  This  analysis  revealed  a  significant 
multivariate  effect  for  transparency  level  using  Wilks’  Lambda  criteria,  F  (4,  21)  =  7.15, 
p  =  0.001,  =  0.58,  A  =  0.42.  Since  the  multivariate  effect  was  significant,  we  considered 

the  univariate  effects  of  each  dependent  variable  separately. 

3. 1.2.1  Proper  lA  Usage  Rates 

Results  for  proper  lA  usage  revealed  a  significant  main  effect  of  transparency  level, 
F  (2,58)  =  12.33,  p  <  0.001,  =  0.30.  The  greatest  rate  of  proper  lA  usage  was  found  in 

SAT  Level  1+2+3  (M=  89%,  SD  =  12.15%),  followed  by  SAT  Level  1+2  (M=  87.44%, 
SD  =  12.60%),  while  SAT  Level  1  had  the  lowest  proper  lA  usage  rate  (M  =  75.85%, 
SD  =  15.29%).  Post  hoc  comparisons  indicated  participants’  proper  lA  usage  rates  were 
significantly  greater  in  SAT  Level  1+2+3  {p  <  0.001)  and  SAT  Level  1+2  {p  =  0.003) 
compared  with  SAT  Level  1 .  There  was  no  significant  differences  between  proper  lA  usage 
rates  between  transparency  SAT  Level  1+2  and  SAT  Level  1+2+3  {p  =  1.00). 

3. 1.2. 2  Correct  lA  Rejection  Rates 

Results  for  correct  lA  rejection  rates  revealed  a  significant  effect  of  transparency  level, 
F  (2,58)  =  15.03, <  0.001,  r^=  0.34.  The  highest  correct  rejection  rates  were  found  in 
SAT  Level  1+2+3  (M=  80.66%,  SD  =  19.97%),  followed  by  SAT  Level  1+2  (M=  67.1 1%, 
SD  =  17.75%),  while  SAT  Level  1  had  the  lowest  correct  rejection  rates  (M=  54.50%, 
SD  =  20%).  Post  hoc  comparisons  indicated  that  participants’  correct  lA  rejection  rates 
were  significantly  greater  in  transparency  SAT  Level  1+2+3  than  in  SAT  Level  1+2 
ip  =  0.04)  and  SAT  Level  \  {p  <  0.001).  Furthermore,  correct  lA  rejection  rates  in 
transparency  SAT  Level  1+2  were  significantly  greater  than  SAT  Level  \  ip  =  0.013). 
Results  for  both  proper  lA  use  and  correct  lA  rejection  rates  for  each  of  the  transparency 
levels  are  displayed  in  Fig.  10. 


Fig.  10  Proper  lA  usage  and  correct  lA  rejection  scores  across  transparency  levels.  Error  bars 
indicate  SEM. 
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Our  next  analysis  was  conducted  on  individual  differences  among  groups.  Our  individual 
differences  analyses  revealed  marginally  significant  interaction  effect  for  working  memory 
with  a  small  effect,  F  (2,56)  =  3.07, =  0.054,  rf=  0.01.  The  significant  interaction  effect 
was  caused  by  performance  differences  in  SAT  Level  1  between  low  and  high  WMC 
groups  (Fig.  11;  Table  3).  Individuals  in  the  low  WMC  group  had  a  lower  proportion  of 
correct  rejections  in  SAT  Level  1  (M=  0.47,  SD  =  0.18)  than  individuals  with  high  WMC 
{M=  0.64,  SD  =  0.19;  d  =  0.92).  While  this  pattern  flipped  in  SAT  Level  1+2  and  SAT 
Level  1+2+3,  these  differences  were  small.  Individuals  in  the  low  WMC  group  had  a 
slightly  greater  proportion  of  correct  rejections  in  SAT  Level  1+2  (M=  0.68,  SD  =  0.18; 
d  =  0.11)  and  SAT  Level  1+2+3  (M=  0.82,  SD  =  0.19  d  =  0.15)  than  individuals  in  the 
high  WMC  group  SAT  Level  1+2  (M=  0.66,  SD  =  0.18),  SAT  Level  1+2+3  (M=  0.79, 
SD  =  0.22). 
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Fig.  11  Interaction  of  WMC  on  correct  rejections  across  transparency  level.  Error  bars  indicate 
SEM. 
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Table  3  Individual  difference  (ID)  factors  for  proper  lA  use  (PU),  correct  lA  rejection  (CR),  and  response  time  (RT) 


ID  Factor 

SAT  Level  1 

SAT  Levei  1+2 

SAT  Level  1+2+3 

PU 

CR 

RT 

PU 

CR 

RT 

PU 

CR 

RT 

Low  SpaV 
High  SpaV 

0.75  (0.20) 
0.77  (0.09) 

0.48  (0.20) 

0.61  (0.19) 

36.75  (18.00) 
29.26  (15.04) 

0.87(0.13) 

0.87(0.13) 

0.71  (0.20) 

0.63  (0.14) 

38.58  (21.95) 
24.49(11.42) 

0.89(0.14) 
0.90  (0.11) 

0.79  (0.20) 
0.82  (0.20) 

37.76  (22.63) 
27.88  (12.06) 

Low  SpaO 
High  SpaO 

0.74  (0.19) 
0.78  (0.11) 

0.51  (0.24) 

0.58  (0.15) 

30.28  (17.11) 
35.73  (16.48) 

0.88  (0.12) 
0.87(0.13) 

0.68  (0.18) 

0.66  (0.18) 

28.91  (17.88) 
34.16  (19.60) 

0.90  (0.12) 
0.88  (0.12) 

0.80  (0.21) 
0.81  (0.20) 

28.21  (17.84) 
37.43  (18.60) 

Low  WMC 
High  WMC 

0.74  (0.19) 
0.79  (0.09) 

0.47  (0.18) 

0.64  (0.19) 

32.57  (16.20) 

33.58  (18.06) 

0.89(0.13) 

0.86(0.12) 

0.68  (0.18) 

0.66  (0.18) 

33.26  (19.93) 
29.28  (17.29) 

0.92  (0.11) 
0.85  (0.13) 

0.82  (0.19) 
0.79  (0.22) 

32.43  (19.84) 
33.33  (17.41) 

Non-AVGP 

AVGP 

0.78  (0.17) 
0.73  (0.13) 

0.55  (0.21) 

0.54  (0.19) 

36.87  (17.89) 
27.20  (13.51) 

0.87(0.13) 
0.88  (0.13) 

0.69  (0.18) 

0.64  (0.18) 

30.32  (16.43) 
33.35  (22.17) 

0.88  (0.13) 
0.90  (0.11) 

0.76  (0.19) 
0.87  (0.20) 

34.75  (19.00) 
29.92  (18.17) 

Note:  PU  =  proper  lA  use;  CR  =  correct  lA  rejection;  RT  =  response  time;  SpaV  =  spatial  visualization;  SpaO  =  spatial  orientation;  WMC  =  working  memory  capacity;  AVGP 
=  action  video  game  player. 


3.2  Subjective  Trust 


We  conducted  2  separate  between-subjects  ANOVAs  on  both  the  information  analysis  and 
decision  and  action  selection  automation  subscales.  The  need  for  between-subjects 
analyses  stems  from  previous  research,  which  has  indicated  that  trust  ratings  can  be  biased 
based  on  prior  experience  with  a  system  (e.g.,  Hoff  and  Bashir  2015),  and  thus  we  used  the 
first  block  of  trials  that  the  participant  experienced.  Consequently,  we  analyzed  trust  for 
each  subscale  separately  instead  of  creating  a  combined  score. 

There  were  no  significant  differences  across  transparency  levels  for  the  information 
analysis  subscale,  F  (2,27)  =  2.14,  p  =  0.14,  =  0.14.  Results  did  reveal  a  trend,  where 

trust  in  the  system’s  ability  to  integrate  and  display  information  increased  as  transparency 
level  increased.  Trust  was  greater  in  SAT  Level  1+2+3  (M=  5.83,  SD  =  0.63),  subsequently 
decreasing  in  transparency  SAT  Level  1+2  (M  =  5.51,  SD  =  0.73)  and  SAT  Level  1 
(M=  5.19,  SD  =  0.70).  Results  for  the  “suggesting  or  making  decisions”  subscale  were 
significant  for  transparency  level,  F  (2,27)  =  4.01, />  =  0.03,  =  0.23.  Trust  in  the  system’s 

ability  to  suggest  or  make  decisions  increased  as  transparency  level  increased.  Post  hoc 
analysis  revealed  that  trust  was  significantly  greater  in  SAT  Level  1+2+3  (M  =  5.47, 
SD  =  0.61)  than  SAT  Level  1  (M=  4.63,  SD  =  0.88, =  0.031).  No  significant  differences 
were  found  between  SAT  Level  1+2  (M=  4.88,  SD  =  0.50)  and  SAT  Level  1  (p  =  1.0)  or 
SAT  Level  1+2+3  {p  =  0.20),  which  are  displayed  in  Table  4. 

Table  4  Means,  SD,  and  confidence  intervals  (CIs)  for  trnst  snbscales  across  transparency  levels 


Dependent  Variable 

Transparency 

M(SD) 

95%  Cl 

SAT  Level  1 

5.19(0.70) 

[4.69;  5.69] 

Information  display  and  analysis 
trust  subscale 

SAT  Level  1+2 

5.51  (0.73) 

[5.00;  6.03] 

SAT  Level  1+2+3 

5.83  (0.63) 

[5.37;  6.28] 

SAT  Level  1 

4.63  (0.88) 

[4.00;  5.25] 

Decision  action  selection  trust 
subscale 

SAT  Level  1+2 

4.88  (0.50) 

[4.52;  5.24] 

SAT  Level  1+2+3 

5.47  (0.61) 

[5.03;  5.91] 

3.3  Response  Time 

There  were  no  significant  difference  in  response  time  between  all  3  transparency  levels, 
F  (2,58)  =  0.38, p  =  0.69,  =  o.02  (Table  5). 
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Table  5  PU,  CR,  and  RT  data  across  transparency  level 


Dependent  Variable 

Transparency 

M(SD) 

95%  Cl 

SAT  Level  1 

0.76  (0.15) 

[0.70;  0.82] 

Proper  lA  use  rate 

SAT  Level  1+2 

0.87(0.16) 

[0.83;  0.92] 

SAT  Level  1+2+3 

0.89(0.12) 

[0.84;  0.94] 

SAT  Level  1 

0.55  (0.20) 

[0.47;  0.62] 

Correct  lA  rejection  rate  (%) 

SAT  Level  1+2 

0.67  (0.18) 

[0.60;  0.74] 

SAT  Level  1+2+3 

0.81  (0.20) 

[0.73;  0.88] 

SAT  Level  1 

33.00  (16.73) 

[26.76;  39.25] 

Response  time  (s) 

SAT  Level  1+2 

31.53  (18.63) 

[24.58;  38.50] 

SAT  Level  1+2+3 

32.82  (18.51) 

[25.91;  39.73] 

ID  analyses  discovered  a  significant  interaction  effect  for  GE  due  to  gamer  differences  in 
SAT  Level  1  and  SAT  Level  1+2+3  (Lig.  12).  AVGPs  (M=  27.20,  SD  =  13.51)  in  SAT 
Level  1  had  quicker  response  times  than  non-AVGPs  (M=  36.87,  SD  =  17.89;  d  =  0.61). 
This  pattern  was  also  found  in  SAT  Level  1+2+3  but  to  a  lesser  degree;  the  non-AVGPs 
(M=  34.75,  SD  =  19.00)  had  greater  response  times  than  AVGPs  (M=  29.92,  SD  =  18.16; 
d  =  0.26),  F  (2,56)  =  5.74,;?  =  .005,  =  0.17. 


Fig.  12  Interaction  of  GE  on  RT  across  transparency  level.  Error  bars  indicate  SEM. 

3.4  Objective  Workload 

We  report  the  results  from  our  eye  tracking  analysis  having  removed  5  participants  from 
the  analysis  pairwise  across  all  conditions  due  to  missing  data,  n  =  25.  In  terms  of  objective 
trust,  we  did  not  find  any  effects  of  transparency  level  on  either,  fixation  duration  (LD), 
F  (2,48)  =  0.84,  p  =  0.44,  =  0.03  or  pupil  diameter  (PD),  F  (2,48)  =  0.92,  p  =  0.91, 

p2=  0.004. 
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We  found  an  interaction  effect  of  SpaV  on  FD.  Individuals  with  low  SpaV  had  longer  FDs 
in  SAT  Level  1  (M  =  236.49,  SD  =  43.83)  and  1+2  (M  =  250.84,  SD  =  57.27)  than 
individuals  in  the  high  SpaV  group  for  SAT  Level  1  (M=  218.42,  SD  =  42.19;  d  =  0.42) 
and  SAT  Level  1+2  (M=  222.78,  SD  =  38.84;  d=  0.57).  Interestingly,  this  pattern  changed 
in  SAT  Level  1+2+3.  The  low  SpaV  group  had  shorter  FDs  in  SAT  Level  1+2+3 
{M=  229.73,  SD  =  49.78)  than  those  in  the  high  SpaV  group  (M  =  253.78,  SD  =  44.17; 
J  =  0.51),  F  (2,46)  =  6.19,;?  =  0.004,  ^2=0.20  (Fig.  13  and  Table  6). 


Fig.  13  Interaction  effect  of  spatial  ability  on  FD  across  transparency  level.  Error  bars  indicate  SEM. 


Table  6.  Mean  and  standard  deviations  for  individnal  difference  factors  across  transparency  level  for  eye  tracking  variables 


SAT  Level  1 

SAT  Level  1+2 

SAT  Level  1+2+3 

iLf  r  actor 

FD 

PD 

FD 

PD 

FD 

PD 

Low  SpaV 

High  SpaV 

236.49  (43.83) 

218.42  (42.19) 

3.56  (0.52) 

3.77  (0.36) 

250.84  (57.27) 

222.73  (36.84) 

3.60  (0.51) 

3.76  (0.35) 

229.73  (49.78) 

253.78  (44.17) 

3.60  (0.67) 
3.75  (0.36) 

Low  SpaO 

High  SpaO 

223.39  (36.14) 

233.45  (52.03) 

3.47  (0.36) 

3.90  (0.46) 

233.15  (32.22) 

242.70  (67.20) 

3.53  (0.28) 

3.87  (0.54) 

235.32  (53.24) 

248.85  (41.03) 

3.50  (0.37) 
3.90  (0.53) 

Low  WMC 

High  WMC 

216.46  (45.83) 

244.85  (34.04) 

3.68  (0.50) 

3.63  (0.40) 

229.37  (51.67) 

249.32  (46.49) 

3.71  (0.48) 

3.63  (0.40) 

231.98  (54.26) 

255.23  (33.92) 

3.68  (0.54) 
3.70  (0.40) 

Non-AVGP 

AVGP 

239.77  (38.00) 

206.56  (45.63) 

3.68  (0.50) 

3.63  (0.36) 

243.71  (47.79) 

226.04  (53.81) 

3.71  (0.49) 

3.61  (0.36) 

250.34  (47.91) 

225.16  (45.71) 

3.70  (0.55) 
3.61  (0.33) 

Note:  ID  =  individual  difference,  FD  =  average  fixation  duration,  PD  =  average  pupil  diameter,  SpaV  =  spatial  visualization,  SpaO  =  spatial  orientation,  WMC  =  working 
memory  capacity,  AVGP  =  action  video  game  player. 


We  found  a  main  effect  of  SpaO  on  PD.  PD  was  larger  for  the  high  SpaO  group  (M=  3.89, 
SD  =  0.17)  than  the  low  SpaO  group  (M=  3.50,  SD  =  0.03)  across  all  transparency  levels. 
The  difference  was  larger  in  SAT  Level  \  {d=  1.04)  than  SAT  Level  1+2+3  (d  =  0.96)  or 
SAT  Level  1+2  (J  =  0.82),  F  (1,23)  =  5.54, p  =  0.027,  ^2=  0.19  (Fig.  14  and  Table  6). 


Level  1  Level  1+2  Level  1+2+3 

Transparency  Level 


Fig.  14  Effect  of  spatial  ability  on  PD  across  transparency  level.  Error  bars  indicate  SEM. 

3.5  Subjective  Workload 

We  conducted  a  6  (TLX  subscale)  x  3  (transparency  level)  repeated-measures  MANOVA 
on  the  TLX  subscales.  The  effect  of  the  combined  dependent  variables  was  not  significant 
using  Wilks’  Lambda  criteria,  F(12,18)  =  1.14, p  =  0.39,  r\^  =  0.43,  A  =  0.57.  In  addition, 
no  differences  were  found  using  the  univariate  ANOVAs  among  the  individual  subscales; 
therefore,  we  did  not  interpret  these  results.  Fig.  15  shows  each  TLX  subscale  by 
transparency  condition,  and  Table  7  displays  means,  standard  deviations,  and  CIs  across 
transparency  levels  for  all  subscales  and  global  workload. 
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Fig.  15  Average  weighted  TLX  subscale  means  across  each  transparency  level.  Higher  numbers 
indicate  greater  workload,  except  for  performance,  where  higher  numbers  indicate  better  perceived 
performance.  Error  bars  are  SEM. 

Table  7  Means,  standard  deviations,  and  95%  CIs  for  subjective  workload  data  across  transparency 
level 


Transparency  Level 

SAT  Level  1  SAT  Level  1+2  SAT  Level  1+2+3 

M(SD)  95%  Cl  M(SD)  95%  Cl  M(SD)  95%  Cl 


21.16 

[17.25; 

20.78 

[17.25; 

21.60 

[18.45; 

Mental 

(10.17) 

24.95] 

(9.46) 

24.31] 

(8.45) 

24.76] 

1.34 

[-0.04; 

1.33 

[0.23; 

0.76 

[0.25; 

Physical 

(3.71) 

2.73] 

(2.41) 

2.03] 

(1.38) 

1.28] 

9.08 

[6.74; 

9.39 

[6.94; 

9.69 

[7.50; 

T  emporal 

(6.26) 

11.42] 

(6.57) 

11.84] 

(5.87) 

11.88] 

13.44 

[10.74; 

13.16 

[10.74; 

13.09 

[10.74; 

Effort 

(5.84) 

15.57] 

(6.46) 

15.57] 

(6.27) 

15.43] 

5.12 

[2.65; 

4.27 

[1.97; 

3.61 

[1.49; 

Frustration 

(6.63) 

7.60] 

(6.14) 

6.56] 

(5.69) 

5.74] 

9.37 

[6.78; 

9.01 

[6.52; 

8.90 

[6.24; 

Performance 

(6.93) 

11.95] 

(6.68) 

11.51] 

(7.14) 

11.57] 

Total 

59.51 

[52.85; 

57.73 

[50.92; 

57.66 

[51.22; 

workload 

(17.85) 

66.18] 

(18.25) 

64.55] 

(17.25) 

64.10] 

Dependent 

Variable 
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Because  we  did  not  find  any  statistically  significant  differences  between  conditions,  we 
conducted  an  analysis  by  collapsing  across  the  3  transparency  level  conditions  to  determine 
if  significant  differences  existed  between  the  TLX  subscales  for  the  experimental  task  as  a 
whole  using  an  ANOVA.  The  analysis  revealed  significant  differences  among  the 
individual  TLX  subscales,  F  (5,29)  =  43.55,  p  <  0.001,  =  0.56.  Mental  demand 

(M=  21.17,  SD  =  8.65)  was  the  greatest  overall  contributor  of  workload  (all  comparisons 
p  <  0.001).  In  addition,  the  effort  subscale  (M=  13.23,  SD  =  5.20)  was  the  next  greatest 
contributor  of  workload,  which  was  greater  than  physical  workload  (M=  1.08,  SD  =  2.39, 
p  <  0.001)  and  frustration  (M=  4.33,  SD  =  5.51, p<  0.001). 

3.6  Usability 

We  conducted  a  repeated-measures  ANOVA  on  system  usability  scale  total  scores.  The 
analysis  revealed  a  significant  effect  for  transparency  level,  F  (2,58)  =  5.70,  p  =  0.006, 
T)^  =  0.1 1.  Post  hoc  comparisons  indicated  that  participants  found  the  system  more  usable 
in  both  transparency  SAT  Level  1+2+3  {M=  66.75,  SD  =  19.40, =  0.02)  and  SAT  Level 
1+2  (M=  66.42,  SD  =  18.61,;?  =  0.07)  than  in  SAT  Level  1  (M=  61.83,  SD  =  22.77).  No 
significant  differences  were  found  between  SAT  Level  1+2  and  SAT  Level  1+2+3 
(p=  1.00)  (Table  8). 


Table  8  Means,  SD,  and  CIs  for  SUS  data  across  transparency  levels 


Dependent  Variable 

Transparency 

M(SD) 

95%  Cl 

SAT  Level  1 

61.83  (20.77) 

[54.08;  69.59] 

SUS  total  score 

SAT  Level  1+2 

66.42  (18.61) 

[59.47;  73.37] 

SAT  Level  1+2+3 

66.75  (19.40) 

[59.51;  74.00] 

3.7  Personal  Involvement 

We  conducted  a  repeated-measures  ANOVA  on  personal  involvement  scores  to  determine 
if  involvement  varied  across  transparency  levels.  The  analysis  did  not  find  any  significant 
differences  across  transparency  level,  F  (2,58)  =  \  .A3,p  =  0.247,  =  0.047  (Table  9). 


Table  9  Means,  SD,  and  CIs  for  personal  involvement  data  across  transparency  levels 


Dependent  Variable 

Transparency 

M(SD) 

95%  Cl 

Personal  involvement 

SAT  Level  1 

30.40  (5.14) 

28.48;  32.32 

score 

SAT  Level  1+2 

30.73  (4.74) 

28.96;  32.50 

SAT  Level  1+2+3 

31.43  (4.64) 

29.70;  33.16 
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3.8  Decision-making  Strategy 


We  divided  these  data  into  2  separate  analyses.  Quantitative  data  were  subjeeted  to  a  series 
of  ANOVAs,  while  the  qualitative  data  derived  from  the  struetured  interviews  were  eoded 
into  themes  and  analyzed  deseriptively. 

3.8.1  Quantitative  Data 

We  eondueted  a  series  of  repeated-measures  MANOVAs  to  determine  if  differenees 
existed  between  the  partieipant’s  perceived  usefulness  of  different  display  elements  within 
each  level  during  the  experiment. 

3.8. 1.1  SAT  Level  1  information 

Participants  received  SAT  Level  1  information  in  every  block  of  the  experiment.  They 
received  1)  the  name  of  the  current  play  (Play  Name),  2)  detailed  maps  of  the  plan  to 
complete  the  play  (Play  Details),  3)  color-coded  plans  to  indicate  which  vehicles  were 
included  in  the  plan  (Plan  Colors),  4)  the  status  of  each  vehicle  (Vehicle  Status),  5)  a  brief 
summary  of  each  plan  (Information  Bar),  and  6)  an  experimental  aide  designed  to  reduce 
participant  workload  by  providing  them  the  strengths  and  weaknesses  of  each  UxV  (asset 
capability  tile).  Therefore,  we  analyzed  the  results  in  a  6  (display  elements)  x  3 
(transparency  level)  MANOVA.  The  analysis  did  not  reveal  any  differences  between 
display  element  use  rates  of  SAT  Level  1  information  between  transparency  levels  using 
Wilks’  Lambda  criteria,  F  (10,20)  =  =  0.39,  =  0.36,  A  =  0.64.  However,  we 

observed  that  the  mean  for  the  asset  capability  tile  appeared  to  vary  across  transparency 
levels,  and  the  MANOVA  may  have  concealed  differences  for  the  use  of  the  asset 
capability  tile.  Therefore,  we  conducted  a  separate  univariate  ANOVA  for  the  asset 
capability  tile  to  reveal  any  differences  hidden  by  the  MANOVA,  and  the  analysis  showed 
a  significant  difference,  F  (2,58)  =  4.17,/>  =  0.20,  =  0.13.  Therefore,  we  believe  that  the 

other  nonsignificant  findings  masked  the  differences  for  the  asset  capability  tile  (Fig.  16). 
The  asset  capability  tile  was  perceived  as  significantly  more  useful  in  transparency  SAT 
Level  1  (M  =  6.10;  SD  =  1.2;  /»  =  0.006)  than  in  transparency  Levels  2  (M  =  5.50; 
SD  =  1.5)  or  3  (M  =  5.47;  SD  =  1.5;;?  =  0.25). 
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Fig.  16  Usefulness  ratings  across  transparency  level  conditions  specifically  for  Level  1  user  interface 
elements 

In  addition,  usefulness  ratings  were  assessed  within  eaeh  speeifie  level  using  a  repeated- 
measures  ANOVA  to  determine  whieh  speeifie  display  elements  partieipants  found  most 
useful  to  eompleting  their  deeisions.  Within  transpareney  SAT  Level  1  specifieally,  a  main 
effect  of  information  type  was  found,  F  (5,145)  =  19.65,  p  <  0.001,  =  0.40.  The  play 

name  was,  significantly,  the  least  helpful  piece  of  rationale  information  given  to  the 
participant  by  the  system  (M=  1.87,  SD  =  \.2%,p  <  0.001),  while  the  asset  capability  tile 
was  found  to  be  the  most  helpful  piece  of  rationale  information  given  by  the  system  in  SAT 
Level  1  {M=  6.10,  SD  =  1.90, <  0.001).  The  remaining  display  elements  did  not  differ 
significantly  from  each  other. 

3. 8. 1.2  SAT  Level  2  information 

Participants  received  SAT  Level  2  information  only  in  2  conditions  (transparency  Levels 
2  and  3;  Fig.  7)  of  the  experiment.  Participants  received  6  pieces  of  SAT  Level  2  rationale 
information:  1)  the  size  of  the  vehicle  to  indicate  UxV  capabilities  (Vehicle  Size); 
2)  environmental  overlays  on  the  maps  (e.g.,  wind,  fog,  roadway  debris;  icons  on  map 
[alerts]);  3)  the  sprocket,  which  was  divided  into  overall  as  well  as  4)  Wedge  Size  and  5) 
Wedge  Color,  and  6)  lA  reasoning  information  provided  in  a  table  for  each  plan  (Table 
Text).  Therefore,  we  analyzed  the  results  in  a  6  (display  elements)  x  2  (transparency  level) 
MANOVA.  The  analysis  did  not  reveal  any  differences  between  information  use  rates  of 
SAT  Level  2  information  between  transparency  levels  using  Wilks’  Lambda  criteria, 
F  (4,25)  =  0.69,  />  =  0.61,  =  0.10,  A  =  0.91.  A  comparison  of  the  means  revealed  no 
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significant  difference  between  SAT  Level  2  and  SAT  Level  3  and  almost  identieal  values 
between  ratings  of  transpareney  SAT  Level  2  and  3;  therefore,  SAT  Level  2  information 
was  used  similarly  aeross  both  transpareney  levels  (Fig.  17). 


(alerts)  Overall 

Level  2  information 

Fig.  17  Usefulness  ratings  across  transparency  level  conditions  specifically  for  SAT  Level  2  user 
interface  elements 

Additionally,  usefulness  ratings  were  assessed  within  each  specific  level,  using  repeated- 
measures  ANOVA,  to  determine  whieh  speeifie  display  elements  partieipants  found  most 
useful  to  eompleting  their  deeisions.  Within  transpareney  SAT  Level  2  speeifieally,  a  main 
effect  of  information  type  was  found,  F  (5,145  =  9.66,  p  <  0.001,  =  0.25.  The  sproeket 

(M=  5.96,  SD  =  1.29,  p  <  0.001)  and  the  text  table  (M=5.75,  SD  =  \  .53,p  <  0.001)  were 
pereeived  as  the  most  useful  display  elements  given  to  the  partieipant  by  the  system.  Within 
the  sproeket,  the  wedge  eolor  (M=  5.62,  SD  =  1.50)  was  seen  as  signifieantly  more  helpful 
than  the  wedge  size  (M=  5.28,  SD  =  1.44),  t  (29)  =  2.2S,p  =  0.03,  Cohen’s  d=  0.23.  The 
remaining  display  elements  did  not  differ  signifieantly  from  eaeh  other. 

3. 8. 1.3  SAT  Level  3  information 

Partieipants  reeeived  SAT  Level  3  information  only  in  the  transpareney  SAT  Level  3  bloek 
of  the  experiment  and  we  were  primarily  eoneemed  with  the  transpareney  of  the  sproeket 
and  the  text  table  (Fig.  8);  therefore,  we  analyzed  the  results  using  a  paired  t-test.  The 
analysis  did  not  reveal  any  differenees  between  the  participants’  information  use  rates  of 
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the  sprocket  displaying  uncertainty  compared  with  the  table  displaying  uncertainty 
t  (29)  =  1 . 14,  p  =  0.27,  Cohen’s  d  =  -0.20,  indicating  that  no  differences  exist  between  the 
usefulness  of  these  pieces  of  uncertainty  (Fig.  18). 
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Fig.  18  Usefulness  ratings  across  transparency  level  conditions  specifically  for  SAT  Level  3  user 
interface  elements 

3.8.2  Qualitative  Data:  Differences  between  Training  and  Experiment 

We  also  used  information  gleaned  from  structured  interviews  to  determine  if  there  was  a 
strategy  change  between  training  and  the  experimental  sessions.  During  training, 
participants  may  use  many  strategies  and  through  experience  may  change  those  strategies 
based  on  the  feedback  given.  It  was  expected  that  during  the  experimental  blocks 
participants  would  discontinue  strategies  that  were  problematic  during  training  given  that 
only  those  who  do  well  on  the  evaluation  phase  of  the  experiment  continue  to  this  phase. 
We  used  the  qualitative  data  to  support  this  analysis  by  looking  at  the  number  of  strategies 
and  the  change  between  strategies  from  training  to  experimental  sessions.  Answers  to  open- 
ended  questions  on  the  strategy  interview  were  grouped  into  strategies  and  analyzed  by 
testing  the  number  of  strategies  between  training  and  experimental  blocks  across  all 
transparency  levels  during  the  experiment. 
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3. 8. 2.1  SAT  Level  1  Strategies 

A  repeated-measures  MANOVA  was  conducted  to  determine  the  differences  between 
training  and  experiment  on  SAT  Level  1  information,  and  no  differences  were  found  as  a 
main  effect,  F  (5,25)  =  1.35,  p  =  0.28,  =  0.21,  A  =  0.78,  or  as  an  interaction  between 

transparency  level  and  session,  F  (10,110)  =  1.00,  p  =  0.41,  =  0.08,  A  =  0.84.  We 

reported  partial  eta  squared  as  reported  in  SPSS  for  our  MANOVA  results  (Fig.  19). 
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Fig.  19  Number  of  participants  using  specific  strategies  during  both  training  and  experiment  for  the 
transparency  SAT  Level  1  condition 


The  qualitative  data  indicate  that  while  participants  perceived  no  differences  in  the 
usefulness  between  the  different  parts  of  the  interface,  they  used  that  information  in 
different  ways.  For  SAT  Level  1,  participants  tried  to  use  the  differences  between  the  2 
plans  during  training,  while  in  the  experimental  block  they  used  Intel  much  more  as  they 
progressed  through  training  and  into  the  experimental  blocks. 


3. 8. 2. 2  SAT  Level  2  Strategies 

A  repeated-measures  MANOVA  was  conducted  to  determine  the  differences  between 
training  and  experiment  on  SAT  Level  2  information,  and  no  differences  were  found  as  a 
main  effect,  F  (4,25)  =  \.\2,  p  =  0.38,  =  0.15,  A  =  0.85,  or  as  an  interaction  between 

transparency  level  and  session,  F  (4,25)  =  0.82, =  0.41,  ri^=  0.06,  A  =  0.94  (Fig.  20). 
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Fig.  20  Number  of  participants  using  specific  strategies  during  both  training  and  experiment  for  the 
transparency  SAT  Level  1+2  condition 

In  SAT  Level  2,  during  training,  partieipants  used  the  sproeket  and  tried  to  understand 
eapabilities,  while  during  experiment  they  used  the  Intel,  eompared  the  plans,  and  then 
used  the  sproeket  to  finalize  their  decision,  while  others  only  used  the  sprocket. 

3. 8. 2. 3  SAT  Level  3  Strategy 

A  repeated-measures  MANOVA  was  conducted  to  determine  the  differences  between 
training  and  experiment  on  SAT  Level  3  information,  and  no  differences  were  found  as  a 
main  effect,  F(2,  28)  =  1.56, p  =  0.23,  v^=  0.10,  A  =  0.90  (Fig.  21). 
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Fig.  21  Number  of  participants  using  specific  strategies  during  both  training  and  experiment  for  the 
transparency  SAT  Level  1+2+3  condition 

The  strategies  used  in  SAT  Level  3  were  similar  to  the  strategies  used  in  SAT  Level  2 
exeept  many  more  people  tried  to  eompare  the  plans  during  training,  while  during  the 
experiment  they  relied  much  more  heavily  on  the  sprocket  and  starting  using  the  table  due 
to  the  additional  information  that  was  provided. 


37 


4.  Conclusions  and  Discussion 


The  current  study  had  several  goals.  The  primary  objective  was  to  study  the  effect  of 
transparency  level  on  user  trust  and  to  determine  any  potential  performance  trade-offs  that 
may  occur  with  respect  to  response  time  or  workload.  Across  all  mission  events, 
participants  were  assisted  by  an  lA  that  presented  them  with  2  plans  with  which  to  achieve 
the  mission  objectives  and  satisfy  commander’s  intent.  Participants  had  to  weigh  vehicle 
capabilities,  locations,  and  Intel  with  the  lA’s  assessment  of  plan  success,  potential 
uncertainties,  and  recommendations  to  achieve  mission  success.  The  lA  made  optimal 
recommendations  62.50%  of  the  time;  failure  to  do  so  was  due  to  information  that  had  not 
yet  been  processed.  Thus,  the  primary  measure  of  trust  was  the  participant’s  automation 
usage  decision  to  accept  or  reject  the  system’s  recommendation  of  Plan  A.  Secondary  goals 
of  the  study  included  an  exploration  of  the  effects  of  an  operator’s  perception  of  system 
usability  and  the  implications  of  individual  differences  (spatial  ability,  WMC,  and  action 
GE)  on  trust,  workload,  and  ratings  of  the  utility  of  each  display  element  used  in  the 
experiment. 

The  objective  trust  data  supported  our  hypothesis  that  increases  in  information  to  support 
operator  transparency  lead  to  greater  lA  with  proper  uses  and  correct  lA  rejections  in  Level 
1+2  and  even  more  in  Level  1+2+3,  which  means  both  disuse  and  misuse  decisions 
decreased  across  transparency  levels  as  well.  The  addition  of  reasoning  information  in 
Level  1+2  increased  proper  lA  use  by  1 1%  and  correct  rejection  rate  by  12%.  The  addition 
of  both  reasoning  and  uncertainty  information  in  Level  1+2+3  improved  proper  lA  use  by 
an  additional  2%  and  correct  rejection  rate  by  an  additional  14%.  Taken  together,  the  proper 
lA  usage  and  correct  I A  rejection  percentages  indicate  that  objective  trust  calibration 
increased  linearly  as  a  function  of  transparency  level,  with  Level  1+2+3  proper  lA  usage 
rate  at  90%  and  correct  lA  rejection  rate  at  81%.  This  increase  suggests  that  incorporating 
information  regarding  both  reasoning  and  uncertainty  into  heterogeneous  tactical  decision 
making  successfully  allowed  our  participants  to  indicate  a  more  accurate  trust  in  the  lA 
shown  by  more  accurate  performance  when  making  tactical  decisions.  These  results  are 
consistent  with  Helldin  et  al.’s  (2014)  findings  that  information  that  supported  increased 
transparency  also  increased  task  performance  as  well  as  with  Linger  and  Bisantz’s  (2002) 
findings  that  displaying  uncertainty  information  can  support  operator  decision  making. 
This  relationship  between  performance  and  agent  transparency  was  also  supported  by  our 
SDT  analysis.  Level  1+2+3  yielded  participants  with  both  the  highest  d'  and  A'  values, 
indicating  the  greatest  level  of  perceptual  sensitivity  and  paralleling  the  findings  of  the 
objective  trust  data. 

Using  automation  usage  decisions  as  an  objective  measure  of  trust,  however,  only  partially 
gauges  operators’  trust  in  the  lA.  Participants  may  not  have  trusted  the  system  at  all, 
disregarded  the  lAs  recommendation,  and  manually  solved  each  mission.  This  would 
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indicate  that  our  analysis  of  our  objective  trust  data  was  flawed,  as  overall  system  disuse 
should  deerease  when  it  beeomes  more  transparent.  Therefore,  subjeetive  trust  measures 
are  needed  to  provide  further  insight  into  the  partieipants’  trust.  Subjective  trust  results, 
using  the  modified  Jian  et  al.  (2000)  scale,  aligned  with  our  objective  trust  findings; 
therefore,  we  rejeet  that  operators  failed  to  trust  the  lA  and  that  the  more -parsimonious 
explanation  that  the  inereased  transpareney  based  on  the  SAT  model  inereased  operator 
trust  in  the  agent.  Results  for  the  “integrating  and  displaying  information”  and  the  “deeision 
and  aetion  seleetion  automation”  subseales  provided  evidenee  that  our  operators  trusted  the 
lA’s  reeommendation  more  when  the  system  was  more  transparent.  This  result  is  consistent 
with  the  findings  of  Oduor  and  Wiebe  (2008),  in  whieh  transparency  positively  affected 
trust  ealibration,  suggesting  that  trust  in  human-agent  teams  is  an  important  faetor  in 
performanee  (Freedy  et  al.  2007).  Further,  we  mention  ealibrated  trust  beeause  we 
hypothesized  that  calibrated  trust  would  be  assoeiated  with  both  greater  objective  and 
subjeetive  trust  in  the  lA.  Additionally,  subjeetive  trust  subseale  ratings  were  sensitive  to 
our  system  reliability  manipulation.  We  manipulated  the  reliability  of  the  lA’s 
reeommendation,  but  the  information  supporting  agent  transpareney  in  the  eurrent  system 
provided  to  the  partieipants  was  aeeurate  for  eaeh  mission.  Therefore,  trust  was  greater  for 
information  analysis  automation,  whieh  was  aeeurate  100%  of  the  time  eompared  with 
deeision  and  aetion  selection,  which  was  accurate  only  62.5%  of  the  time  for  all 
transpareney  levels.  Taken  together,  these  findings  suggest  that  partieipants  displayed 
appropriate  trust  ealibration. 

The  individual  differenees  analyses  failed  to  find  signifieant  individual  differenees 
between  either  of  our  spatial  ability  or  pereeived  attention  eontrol  measures  with  objective 
or  subjeetive  trust  seores  aeross  transpareney  levels.  Previous  studies  have  found  that 
differenees  in  both  pereeived  attentional  eontrol  and  spatial  abilities  were  key  to 
understanding  differenees  in  task  performanee  while  managing  multi-robotie  systems 
(Chen  and  Barnes  2012a,  2012b;  Chen  et  al.  2008;  Lathan  and  Traeey  2002).  Therefore, 
we  hypothesized  that  pereeived  attentional  eontrol  and  spatial  abilities  would  be  important 
factors  in  our  task.  However,  our  study  differed  in  several  key  aspeets  from  previous 
supervisory  eontrol  studies,  whieh  may  have  lessened  individuals’  use  of  attentional  or 
spatial  skills.  Previous  human-robot  interaetion  studies  have  all  used  sensor  feeds  from 
eameras  as  a  eomponent  of  their  performanee  or  deeision-making  tasks.  In  these  tasks, 
partieipants  either  had  to  use  visual  information  to  teleoperate  an  unmanned  vehiele  (Chen 
et  al.  2008;  Lathan  and  Tracey  2002)  or  eomplete  a  threat  deteetion  task  while  making 
route  deeisions  for  a  team  of  robots  (Chen  and  Barnes  2012b).  Performanee  on  our  task 
required  integrating  information  displayed  by  both  the  lA  and  Intel  to  determine  if  the  lA 
was  basing  its  deeisions  on  a  faulty  premise  or  incomplete  information.  Further,  our  tasks 
did  not  speoifieally  require  manipulating  objeets  or  the  robot’s  loeation  in  spaee  and  the 
map  always  remained  in  the  same  orientation;  thus,  the  operators’  spatial  abilities  were 
used  less  frequently  than  in  the  aforementioned  teleoperation  studies. 
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We  also  looked  at  another  key  individual  differenee,  WMC,  with  regard  to  objeetive  trust. 
We  found  that  individuals  with  low  WMC  had  worse  performanees  in  Levels  1  and  1+2+3 
while  WMC  did  not  significantly  vary  in  Level  2.  Participants  were  given  only  basic 
information  in  Level  1 ,  and,  to  make  accurate  decisions,  they  had  to  process  and  synthesize 
the  information  given  to  them;  therefore,  presumably,  individuals  with  higher  WMC 
performed  better  at  this  task  due  to  that  capacity.  Interestingly,  we  found  a  similar  pattern 
for  Level  1+2+3,  indicating  that  while  participants  were  shown  uncertainty  information, 
they  also  had  to  determine  what  the  effects  of  uncertainty  are  within  the  context  of  each 
specific  mission.  Previous  research  has  indicated  that  uncertainty  information  adds 
working  memory  load  and,  consequently,  individuals  with  low  WMC  are  more  likely  to 
rely  on  heuristics  to  resolve  their  memory  load  than  individuals  with  higher  WMC  (Quayle 
and  Ball  2000). 

Another  potential  effect  of  adding  additional  user  interface  elements  to  support  agent 
transparency  is  that  individuals  stop  using  basic  information  for  reasoning  or  uncertainty 
elements  (i.e.,  sprocket  or  text  table).  The  results  of  the  analyses  performed  on  strategy 
differences  between  transparency  level  conditions  indicated  that  all  Level  1  elements, 
except  for  the  asset  capability  tile,  were  used  similarly  across  transparency  levels.  The  asset 
capability  tile  was  not  a  specific  user  interface  element  but  rather  an  experimental  addition 
to  prevent  novice  participants  from  having  to  memorize  asset  capabilities  and  sensor 
payload  information;  therefore,  the  finding  that  individuals  rated  the  asset  capability  tile  as 
more  helpful  in  the  Level  1  condition  serves  as  a  manipulation  check.  As  the  agent  became 
more  transparent,  users  did  not  have  to  do  as  much  work  to  determine  the  correctness  of 
the  lA’s  decisions,  and  thus  the  asset  capability  tile  became  less  useful  during  Level  1+2 
and  Level  1+2+3  conditions.  The  condition  order  was  counterbalanced,  so  we  can  reject 
any  potential  confounds  from  participant  usefulness  ratings  and  experience  using  the  asset 
capability  tile  information.  We  also  found  that  participants  rated  Level  2  user  interface 
elements  as  similarly  helpful  across  both  level  1+2  and  Level  1+2+3  conditions.  Overall, 
the  sprocket  and  text  table  were  rated  as  the  most  helpful  pieces  of  Level  2  information. 
The  ratings  for  Level  3  information  found  no  significant  differences  between  the 
uncertainty  displayed  in  the  sprocket  or  the  text  table.  This  finding  was  similar  to  that  of 
the  Level  2  information  where  the  sprocket  was  also  rated  as  more  helpful  than  the  text 
table.  The  structured  strategy  interviews  revealed  that,  typically,  participants  primarily 
relied  on  the  sprocket  and  used  the  table  information  as  a  secondary  source  of  information. 
This  finding  is  logical  because  the  sprocket  was  a  very  salient  element  in  the  display  that 
conveyed  information  about  priority,  reasoning,  and  potential  uncertainties  presented  in 
the  plan. 

Our  analysis  between  different  overall  strategies  used  between  training  and  experimental 
blocks  indicated  several  differences  as  operators  learned  to  complete  the  task.  During 
training,  individuals  are  brand  new  to  the  system  and  have  not  had  much  experience  using 
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the  interface.  During  the  experimental  blocks,  they  were  much  more  experienced  since 
participants  had  already  gone  through  training  as  well  as  the  evaluation  blocks,  indicating 
that  the  change  from  training  to  experiment  denotes  stopping  strategies  that  may  not  be 
particular  useful  and  increased  use  of  strategies  that  participants  found  helpful.  During  the 
Level  1  condition,  participants  placed  emphasis  on  comparing  and  contrasting  the 
differences  between  plans  and  looking  at  the  information  bar.  During  the  experimental 
missions,  they  found  these  strategies  less  useful  and  instead  focused  on  both  capability 
differences  between  the  assets  and  using  the  Intel  received  during  each  mission.  During  the 
Level  1+2  condition,  participants  initially  used  vehicle  capability,  Intel,  and  text  table; 
however,  after  training,  participants  placed  a  greater  emphasis  on  the  sprocket,  comparing 
the  differences  between  the  plans  using  the  sprocket  with  many  participants  only  using  the 
sprocket  as  their  sole  strategy.  This  same  pattern  was  also  found  in  Level  1+2+3.  Overall, 
this  finding  indicates  that  participants  spent  less  time  manually  checking  each  plan  and 
instead  relying  more  on  the  system  displays  (sprocket  and  text  table)  and  using  the  Level 
1  information  to  confirm  or  double-check  their  trust  in  the  system. 

With  regard  to  our  analyses  of  the  response  bias  data,  we  did  not  find  significant  differences 
that  would  indicate  a  particular  bias  between  the  lA’s  recommended  and  backup  plan  for 
all  transparency  levels.  Overall,  decisions  were  somewhat  liberal,  which  was  expected  due 
to  the  greater  percentage  of  Plan  A  scores.  This  finding  further  suggests  that  complacency 
did  not  appear  to  contribute  to  the  participant’s  decision  in  the  current  study.  One  reason 
for  this  finding  may  have  been  the  lack  of  workload  differences  between  transparency 
levels.  Greater  levels  of  workload  may  have  forced  a  certain  level  of  reliance  on  the  system 
due  to  the  cost  of  manually  solving  each  decision.  Previous  studies  have  found  significant 
workload  differences  between  different  levels  of  automation  assistance  (Wright  et  al. 
2013).  Additionally,  other  domains  found  that  increased  workload  can  negatively  affect  a 
decision-making  performance,  causing  operators  to  sacrifice  performance  rather  than 
optimize  performance  (Cummings  2006).  In  the  current  experiment,  complacency  could 
have  occurred  during  SAT  Level  1  due  to  the  decreased  transparency  of  the  system  and  the 
level  of  task-related  information  provided  to  the  participants  (Parasuraman  et  al.  1993).  At 
SAT  Level  1+2  and  SAT  Level  1+2+3,  we  hypothesized  complacency  could  occur  due  to 
the  increased  amount  of  information  in  the  display  that  may  create  higher  mental  demand 
(Inagaki  and  Itoh  2013).  Therefore,  the  nonsignificant  response  bias  findings  indicate  that 
operator  responses  were  a  function  of  the  level  of  information  provided  by  the  system, 
rather  than  an  indicator  of  workload.  As  previously  stated,  we  failed  to  find  any  significant 
differences  in  workload  across  conditions  measured  objectively  or  subjectively. 

In  addition,  we  also  failed  to  find  increased  response  time  as  a  function  of  transparency 
level.  Previous  research  has  found  speed  accuracy  trade-offs  as  well  as  an  association 
between  speed  and  workload  indicative  of  additional  processing  requirements  (Helldin 
et  al.  2014).  Our  results  indicated  that  participants’  perception  of  mental  demand  and  effort 
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were  eonsistently  greater  than  other  subseales  aeross  transparency  levels,  suggesting  that 
the  task  primarily  stressed  mental  demand  and  effort  that  is  appropriate  for  the  task 
supporting  the  validity  of  the  subjective  workload  findings.  Additionally,  our  analysis  of 
the  TLX  data  revealed  an  interesting  potential  trend.  Frustration  decreased  as  transparency 
information  increased.  This  finding,  along  with  our  other  findings,  align  well  with  the  SUS 
findings,  which  indicated  that  participants’  perceived  usability  of  the  system  increased  as 
transparency  information  increased.  Thus,  usability  could  be  an  underlying  factor  in 
participants’  performance,  trust,  and  workload. 

We  further  analyzed  eye-tracking  data  (fixation  duration  and  pupil  diameter)  as  a  measure 
of  objective  workload.  Aligning  with  the  TLX  scores,  we  did  not  find  a  change  in  workload 
as  we  added  transparency  information.  The  goal  of  transparency  information  is  to  mitigate 
workload  by  offloading  information  synthesis  to  the  lA  and  displaying  that  information  to 
the  operator  in  a  meaningful  way.  These  findings  indicate  that  the  benefits  of  transparency 
may  not  introduce  potential  costs  for  workload  supporting  this  idea  instead  of  increased 
costs  for  implementing  transparency. 

While  no  significant  differences  in  workload  were  found  generally,  we  accessed  the  effects 
of  our  individual  differences  measures.  In  doing  so  we  found  an  interesting  dissociation 
between  spatial  abilities  and  our  eye  tracking  measures.  We  found  significant  differences 
between  high  and  low  SpaV  groups  for  fixation  duration  but  not  pupil  diameter.  This 
dissociation  occurred  between  SpaO  operators  as  well,  with  significant  differences  for 
pupil  diameter  but  not  fixation  duration.  Fixation  duration  has  been  linked  to  workload  and 
stress  as  well  as  more  effortful  scene  processing.  Some  studies  have  shown  shorter  fixation 
durations  for  higher  workload  conditions  that  lead  to  greater  visual  scanning  (Van  Orden, 
et  al.  2001).  Therefore,  it  appears  participants  with  lower  spatial  abilities  were  under  more 
stress  than  those  with  higher  spatial  abilities.  Individuals  with  higher  spatial  abilities  were 
able  to  focus  on  more  critical  parts  of  the  interface  while  those  with  lower  spatial 
visualization  skills  were  forced  to  scan  around  the  interface. 

Decreased  pupil  diameter  has  previously  been  found  to  be  an  indicator  of  fatigue 
(Holmqvist  et  al.  2011);  thus,  it  is  possible  that  individuals  with  lower  spatial  orientation 
skills  became  fatigued  faster  by  the  amount  of  information  in  the  interface  than  those  with 
greater  spatial  abilities.  Increased  pupil  diameter,  on  the  other  hand,  has  also  been 
associated  with  increased  cognitive  workload  (Van  Orden  et  al.  2001).  This  possibility  is 
seemingly  at  odds  with  the  previous  findings,  as  both  SpaV  and  SpaO  are  related  abilities. 
Since  individuals  with  higher  SpaO  are  better  able  to  integrate  and  process  information 
from  the  environment  and  take  different  perspectives  given  a  single  egoentric  viewpoint, 
we  believe  that  those  with  higher  spatial  abilities  tried  to  directly  compare  the  maps  among 
each  other,  while  individuals  with  lower  spatial  abilities  may  not  have  relied  on  other 
information  in  the  environment.  This  possible  behavior  may  also  explain  the  interaction  in 
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SAT  Level  1+2+3  as  individuals  with  higher  SpaV  abiliites  may  have  attempted  to 
integrate  the  projections  of  uncertainty  into  the  map  to  better  understand  its  effects  to  the 
plan. 

4.1  Future  Work 

In  the  current  experiment  we  defined  our  SAT  Level  3  manipulation  as  the  display  of 
uncertainty.  While  uncertainty  is  a  key  variable  that  affects  how  conclusions  are  made  on 
projected  information,  after  further  consideration  we  believe  that  uncertainty  may  apply  to 
each  level  of  the  SAT  model,  as  it  does  not  solely  reflect  the  system’s  future  state.  For 
example,  sensor  errors  may  make  basic  information  uncertain,  and  uncertainties  in  SAT 
Level  2  may  make  the  operator  circumspect  of  the  lA’s  reasoning  process.  Additionally, 
both  of  these  uncertainties  can  be  separated  from  SAT  Level  3  uncertainties  about  the  lA’s 
projections  of  future  states.  Therefore,  future  research  should  investigate  how 
incorporating  uncertainty  into  each  level  of  transparency  could  affect  performance,  trust, 
and  workload.  We  have  one  such  study  planned  using  a  more  ecologically  valid  interface, 
the  AFRL  Fusion  test  bed;  however,  information  used  to  support  agent  transparency  may 
be  somewhat  contextually  dependent,  meaning  each  system  needs  to  determine  which 
display  elements  are  best  for  that  specific  system. 

4.2  Conclusion 

Our  findings  are  increasingly  important  to  facilitate  decision  making  between  the  human 
operator  and  complex  automated  systems.  Since  automation  is  a  key  part  of  future  systems, 
operators  will  need  to  rely  on  advanced  automation,  such  as  lAs,  to  enhance  mission 
effectiveness  due  to  the  increased  level  of  information  flow  (Paas  and  Merrienboer  1994). 
We  examined  the  level  of  information  received  from  the  lA  needed  to  create  an  effective, 
transparency  interface,  specifically  addressing  3  issues:  performance,  trust,  and  workload. 

Unlike  Helldin  et  al.  (2014),  who  found  that  increased  transparency  resulted  in  increased 
performance  and  trust  calibration  at  the  cost  of  greater  workload  and  longer  response  time, 
our  results  support  the  addition  of  transparency  information,  loosely  based  on  the  SAT 
model  (Chen  et  al.  2014).  The  addition  of  transparency  information  greatly  improved 
decision-making  accuracy  and  perceptual  sensitivity  without  cost  to  speed  or  increased 
workload.  These  findings  align  well  with  our  trust  and  usability  data.  Both  trust  subscales 
suggest  that  participants  trusted  the  lA’s  recommendation  more  when  the  system  was  more 
transparent.  Similarly,  SUS  findings  indicated  that  participants’  perceived  usability  of  the 
system  increased  as  transparency  information  increased. 
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Appendix  A.  Demographics 


This  appendix  appears  in  its  original  form,  without  editorial  change. 
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Demographic  Questionnaire 


Participant  # 


Gender 


Age 


Major 


Date 


1.  What  is  the  highest  level  of  education  you  have  had?  (Circle  one  only) 

a)  Less  than  4  yrs  of  college  b)  Completed  4  yrs  of  college  c)  Other 


□  Grade  School 

□  High  School 

□  College 

□  Jr.  High 

n  Technical  School 

n  Did  Not  Use 

3,  Where  do  you  currently  use  a  computer?  (Check  all  that  apply) 

n  Home 
n  Work 
n  Library 

n  Other _ 

n  Do  Not  Use 
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4,  For  each  of  the  following  questions,  circle  the  response  that  best  describes 
you. 


How  often  do  you: 


Use  a  mouse? 

Daily 

Weekl 

y 

Monthl 

y 

Onoe 

every 

few 

months 

Rarely 

Never 

Use  a  joystiek? 

Daily 

Weekl 

y 

Monthl 

y 

Onoe 

every 

few 

months 

Rarely 

Never 

Use  a  toueh  sereen? 

Daily 

Weekl 

y 

Monthl 

y 

Onoe 

every 

few 

months 

Rarely 

Never 

Use  ieon-based 
programs/software? 

Daily 

Weekl 

y 

Monthl 

y 

Onoe 

every 

few 

months 

Rarely 

Never 

Use  programs/software 
with  pull-down  menus? 

Daily 

Weekl 

y 

Monthl 

y 

Onoe 

every 

few 

months 

Rarely 

Never 

Use  graphics/drawing 
features  in  software 
paekages? 

Daily 

Weekl 

y 

Monthl 

y 

Once 

every 

few 

months 

Rarely 

Never 

Use  E-Mail? 

Daily 

Weekl 

y 

Monthl 

y 

Once 

every 

few 

months 

Rarely 

Never 

Operate  a  radio  eontrolled 
vehiele  (oar,  boat,  or 
plane)? 

Daily 

Weekl 

y 

Monthl 

y 

Once 

every 

few 

months 

Rarely 

Never 

Play  computer/video 
games? 

Daily 

Weekl 

y 

Monthl 

y 

Once 

every 

few 

months 

Rarely 

Never 

5,  Which  type(s)  of  computer/video  games  do  you  most  often  play  if  you  play 
at  least  once  every  few  months?: 
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6.  Which  of  the  following  best  describes  your  expertise  with  computers? 

{Circle  one  only) 

a)  Novice 

b)  Good  with  one  type  of  software  paekage  (sueh  as  word  proeessing  or 
slides) 

c)  Good  with  several  software  paekages 

d)  Can  program  in  one  language  and  use  several  software  paekages 

e)  Can  program  in  several  languages  and  use  several  software  paekages 

7.  Are  you  in  your  good/  comfortable  state  of  health  physically?  YES  NO 

If  NO,  please  briefly  explain; 

8.  How  many  hours  of  sleep  did  you  get  last  night?  _ hours 

9.  Do  you  have  normal  color  vision?  YES  NO 

10.  Do  you  have  prior  military  service?  YES  NO 

If  YES,  how  long?:  _ years 

Please  answer  the  following  questions  about  how  you  play  video  games  by 
eircling  a  number  on  the  provided  seale,  from  1  (strongly  disagree)  to  6 
(strongly  agree). 


Strongly 

Disagree 


Strongly 

Agree 


1 1 . 1  ean  always  manage  to 
solve  diffieult  problems 
within  a  video  game  if  I 
try  hard  enough. 

1 

2 

3 

4 

5 

6 

12.  In  a  video  game,  if 

someone  opposes  me,  I 
ean  find  the  means  and 
ways  to  get  what  I  want. 

1 

2 

3 

4 

5 

6 

13.  It  is  easy  for  me  to  stiek  to 

my  plans  and  aeeomplish 
my  goals  in  a  video  game. 

1 

2 

3 

4 

5 

6 

14. 1  am  eonfident  that  I  eould 

deal  efficiently  with 
unexpeeted  events  in  a 
video  game. 

1 

2 

3 

4 

5 

6 

15.  Thanks  to  my 

resoureefulness,  I  know 

how  to  handle  unforeseen  1  2  3  4  5  6 

situations  in  a  video 

game. _ 
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16. 1  can  solve  most  problems 
in  a  video  game  if  I  invest 
the  neeessary  effort. 

1 

2 

3 

4  f 

5  6 

17. 1  can  remain  calm  when 
faeing  diffieulties  in  a 
video  game  because  I  ean 
rely  on  my  eoping 
abilities. 

1 

2 

3 

4  f 

5  6 

18.  When  I  am  eonfronted 
with  a  problem  in  a  video 
game,  I  can  usually  find 
several  solutions. 

1 

2 

3 

4  f 

5  6 

19.  If  I  am  in  trouble  in  a  video 
game,  I  ean  usually  think 
of  a  solution. 

1 

2 

3 

4  f 

5  6 

20. 1  ean  usually  handle 

whatever  eomes  my  way 
in  a  video  game. 

1 

2 

3 

4  f 

5  6 

Please  answer  the  following  questions  about  how  you  feel  about  automation  by 
circling  number  on  the  provided  scale,  from  1  (strongly  disagree)  to  5  (strongly 
agree). 

strongly 

disagree 

strongly 

agree 

1.  I  usually  trust  automation 
until  there  is  a  reason  not 

to. 

1 

2 

3 

4 

5 

2.  For  the  most  part,  I 
DISTRUST  automation. 

1 

2 

3 

4 

5 

3.  In  general,  I  would  rely  on 
automation  to  assist  me. 

1 

2 

3 

4 

5 

4.  My  tendeney  to  trust 
automation  is  high. 

1 

2 

3 

4 

5 

5.  It  is  easy  for  me  to  trust 
automation  to  do  its  job. 

1 

2 

3 

4 

5 

6.  I  am  likely  to  trust 

automation  even  when  I 
have  little  knowledge 
about  it. 

1 

2 

3 

4 

5 
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Appendix  B.  Trust  Scale 


This  appendix  appears  in  its  original  form,  without  editorial  change. 
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Trust  Survey 


For  each  of  the  following  items  and  situations,  circle  the  number  which  best 
describes  your  feeling  or  your  impression  based  on  the  system  you  just  used.  For 
each  item,  consider  the  following  situations: 


•  A:  When  the  system  is  collecting  and/or  highlighting/filtering  information. 

•  B:  When  the  system  is  integrating  information,  generating  predictive 
displays,  and/or  presenting  its  analysis. 

•  C:  When  the  system  is  making  decisions  and/or  selecting  actions. 

•  D:  When  the  system  is  executing  actions. 


1. 


2. 


The  system  is  deceptive  when... 


not  at  all 

neutral _ extremely 


A:  Gathering  or  Filtering 

Information 

I  2 

3 

4 

5 

6 

7 

B:  Integrating  and  Displaying 

Analyzed  Information 

I  2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Decisions 

I  2 

3 

4 

5 

6 

7 

D:  Executing  Actions 

I  2 

3 

4 

5 

6 

7 

e  system  behaves  in  an  underhanded  manner  when.. 

not  at  all 
neutral 

• 

extremely 

A:  Gathering  or  Filtering 

Information 

I  2 

3 

4 

5 

6 

7 

B:  Integrating  and  Displaying 

Analyzed  Information 

I  2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Decisions 

I  2 

3 

4 

5 

6 

7 

D:  Executing  Actions 

I  2 

3 

4 

5 

6 

7 

3.  I  am  suspicious  of  the  system’s  intent,  action,  or  outputs  when... 


not  at  all 

neutral _ extremely 


A:  Gathering  or  Eiltering 

Information 

I 

2 

3 

4 

5 

6 

7 

B:  Integrating  and  Displaying 
Analyzed  Information 

I 

2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Decisions 

I 

2 

3 

4 

5 

6 

7 

D:  Executing  Actions 

I 

2 

3 

4 

5 

6 

7 
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4,  I  am  wary  of  the  system  when. . . 


not  at  all 

neutral _ extremely 


A:  Gathering  or  Filtering 

Information 

I 

2 

3 

4 

5 

6 

7 

B:  Integrating  and  Displaying 
Analyzed  Information 

I 

2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Deeisions 

I 

2 

3 

4 

5 

6 

7 

D;  Executing  Actions 

I 

2 

3 

4 

5 

6 

7 

5.  The  system’s  actions  will  have  a  harmful  or  injurious  outcome  when... 


not  at  all 

neutral _ extremely 


A:  Gathering  or  Filtering 

Information 

I 

2 

3 

4 

5 

6 

7 

B:  Integrating  and  Displaying 
Analyzed  Information 

I 

2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Decisions 

I 

2 

3 

4 

5 

6 

7 

D;  Executing  Actions 

I 

2 

3 

4 

5 

6 

7 

6.  I  am  confident  in  the  system  when... 


not  at  all 

neutral _ extremely 


A:  Gathering  or  Eiltering 

Information 

I 

2 

3 

4 

5 

6 

7 

B:  Integrating  and  Displaying 
Analyzed  Information 

I 

2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Decisions 

I 

2 

3 

4 

5 

6 

7 

D:  Executing  Actions 

I 

2 

3 

4 

5 

6 

7 

7.  The  system  provides  security  when. . . 


not  at  all 

neutral  extremely 


A:  Gathering  or  Eiltering 

Information 

I 

2 

3 

4 

5 

6 

7 

B:  Integrating  and  Displaying 
Analyzed  Information 

I 

2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Decisions 

I 

2 

3 

4 

5 

6 

7 

D:  Executing  Actions 

I 

2 

3 

4 

5 

6 

7 

8.  The  system  has  integrity  when. . . 

not  at  all 

_ neutral _ extremely 

A:  Gathering  or  Filtering  ^  o  'X  A  f, 

Intormation 
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B:  Integrating  and  Displaying 
Analyzed  Information 

1 

2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Deeisions 

1 

2 

3 

4 

5 

6 

7 

D;  Executing  Actions 

1 

2 

3 

4 

5 

6 

7 

9,  The  system  is  dependable  when. . . 


not  at  all 

neutral _ extremely 


A:  Gathering  or  Filtering 

Information 

1 

2 

3 

4 

5 

6 

7 

B:  Integrating  and  Displaying 
Analyzed  Information 

1 

2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Decisions 

1 

2 

3 

4 

5 

6 

7 

D:  Executing  Actions 

1 

2 

3 

4 

5 

6 

7 

10,  The  system  is  reliable  when... 


not  at  all 

neutral _ extremely 


A;  Gathering  or  Filtering 

Information 

1 

2 

3 

4 

5 

6 

7 

B:  Integrating  and  Displaying 
Analyzed  Information 

1 

2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Decisions 

1 

2 

3 

4 

5 

6 

7 

D:  Executing  Actions 

1 

2 

3 

4 

5 

6 

7 

11. 1  can  trust  the  system  when... 


not  at  all 

neutral _ extremely 


A:  Gathering  or  Filtering 

Information 

1 

2 

3 

4 

5 

6 

7 

B:  Integrating  and  Displaying 
Analyzed  Information 

1 

2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Decisions 

1 

2 

3 

4 

5 

6 

7 

D:  Executing  Actions 

1 

2 

3 

4 

5 

6 

7 

12, 1  am  familiar  with  the  system  when... 


not  at  all 

neutral _ extremely 


A:  Gathering  or  Filtering 

Information 

1 

2 

3 

4 

5 

6 

7 

B:  Integrating  and  Displaying 
Analyzed  Information 

1 

2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Decisions 

1 

2 

3 

4 

5 

6 

7 

D:  Executing  Actions 

1 

2 

3 

4 

5 

6 

7 
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13,  The  system  is  predictable  when,.. 


not  at  all 

neutral _ extremely 


A:  Gathering  or  Filtering 

Information 

1 

2 

3 

4 

5 

6 

7 

B:  Integrating  and  Displaying 
Analyzed  Information 

1 

2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Deeisions 

1 

2 

3 

4 

5 

6 

7 

D:  Exeeuting  Aetions 

1 

2 

3 

4 

5 

6 

7 

14,  The  system  meets  the  needs  of  the  mission  when... 


not  at  all 

neutral _ extremely 


A:  Gathering  or  Filtering 

Information 

1 

2 

3 

4 

5 

6 

7 

B;  Integrating  and  Displaying 
Analyzed  Information 

1 

2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Deeisions 

1 

2 

3 

4 

5 

6 

7 

D:  Exeeuting  Aetions 

1 

2 

3 

4 

5 

6 

7 

15.  The  system  provides  appropriate  information  when... 


not  at  all 

neutral _ extremely 


A:  Gathering  or  Filtering 

Information 

1 

2 

3 

4 

5 

6 

7 

B;  Integrating  and  Displaying 
Analyzed  Information 

1 

2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Deeisions 

1 

2 

3 

4 

5 

6 

7 

D:  Exeeuting  Actions 

1 

2 

3 

4 

5 

6 

7 

16,  The  system  malfunctions  when... 


not  at  all 

neutral  extremely 


A:  Gathering  or  Filtering 

Information 

1 

2 

3 

4 

5 

6 

7 

B:  Integrating  and  Displaying 
Analyzed  Information 

1 

2 

3 

4 

5 

6 

7 

C:  Suggesting  or  Making  Decisions 

1 

2 

3 

4 

5 

6 

7 

D:  Executing  Actions 

1 

2 

3 

4 

5 

6 

7 
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Now  imagine  that  you  are  employed  as  an  unmanned  vehicle  operator  to  complete 
missions.  Reflecting  on  the  experience  with  the  system  you  just  used,  please  rate 
the  extent  to  which  you  agree  with  each  of  these  items  by  circling  a  value  from  1 
(strongly  disagree)  to  7  (strongly  agree),  where  4  is  neutral. 


Strongly 

Disagree 

Neutral 

Strongly 

Agree 

17.  Using  the  system 
would  improve  my 
job  performance. 

1 

2 

3 

4 

5 

6 

7 

18.  Using  the  system 
would  make  it  easier 

1 

2 

3 

4 

5 

6 

7 

to  do  my  job. 

19. 1  would  find  the 

system  useful  in  my 
job. 

1 

2 

3 

4 

5 

6 

7 

20.  Learning  to 
operate  the  system  is 
easy  for  me. 

1 

2 

3 

4 

5 

6 

7 

21.  It  is  easy  for  me 
to  become  skillful  at 

1 

2 

3 

4 

5 

6 

7 

using  the  system. 

22. 1  find  the  system 
easy  to  use. 

1 

2 

3 

4 

5 

6 

7 

23. 1  intend  to  use 

this  system  for  my 
job. 

1 

2 

3 

4 

5 

6 

7 
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Appendix  C.  National  Air  and  Space  Administration  Task  Load  Index 

(NASA-TLX) 


This  appendix  appears  in  its  original  form,  without  editorial  change. 
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NASA-TLX  Questionnaire 


Please  rate  your  overall  impression  of  demands  imposed  on  you  during  the 
exereise. 

1.  Mental  Demand:  How  mueh  mental  and  pereeptual  aetivity  was  required  (e.g., 
thinking,  looking,  searehing,  etc.)?  Was  the  task  easy  or  demanding,  simple  or 
complex,  exacting  or  forgiving? 

LOW  I— I— I— I— I— I— I— I— I— I  HIGH 
123456789  10 

2.  Physical  Demand:  How  much  physical  activity  was  required  (e.g.,  pushing, 
pulling,  turning,  controlling,  activating,  etc.)?  Was  the  task  easy  or  demanding, 
slow  or  brisk,  slack  or  strenuous,  restful  or  laborious? 

LOW  I— I— I— I— I— I— I— I— I— I  HIGH 
123456789  10 

3.  Temporal  Demand:  How  much  time  pressure  did  you  feel  due  to  the  rate  or 
pace  at  which  the  task  or  task  elements  occurred?  Was  the  pace  slow  and  leisurely 
or  rapid  and  frantic? 


LOW  I— I— I— I— I— I— I— I— I— I  HIGH 
123456789  10 

4.  Level  of  Effort:  How  hard  did  you  have  to  work  (mentally  and  physically)  to 
accomplish  your  level  of  performance? 

LOW  I— I— I— I— I— I— I— I— I— I  HIGH 
123456789  10 

5.  Level  of  Frustration:  How  insecure,  discouraged,  irritated,  stressed  and 
annoyed  versus  secure,  gratified,  content,  relaxed  and  complacent  did  you  feel 
during  the  task? 


LOW  I— I— I— I— I— I— I— I— I— I  HIGH 
123456789  10 

6.  Performance:  How  successful  do  you  think  you  were  in  accomplishing  the 
goals  of  the  task  set  by  the  experimenter  (or  yourself)?  How  satisfied  were  you 
with  your  performance  in  accomplishing  these  goals? 

LOW  I— I— I— I— I— I— I— I— I— I  HIGH 
123456789  10 
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Pairwise  Comparison  of  Factors 

Select  the  member  of  each  pair  that  provided  the  most  significant  source  of 
workload  variation  in  these  tasks. 

Physical  Demand  vs.  Mental  Demand 

Temporal  Demand  vs.  Mental  Demand 

Performance  vs.  Mental  Demand 

Frustration  vs.  Mental  Demand 

Effort  vs.  Mental  Demand 

Temporal  Demand  vs.  Physical  Demand 

Performance  vs.  Physical  Demand 

Frustration  vs.  Physical  Demand 

Effort  vs.  Physical  Demand 

Temporal  Demand  vs.  Performance 

Temporal  Demand  vs.  Frustration 

Temporal  Demand  vs.  Effort 

Performance  vs.  Frustration 

Performance  vs.  Effort 

Effort  vs.  Frustration 
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Intentionally  left  blank. 


66 


Appendix  D.  System  Usability  Scale 


This  appendix  appears  in  its  original  form,  without  editorial  change. 
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System  Usability  Scale 


Please  answer  the  following  questions  about  the  system  you  just  used  by  circling 
a  number  on  the  provided  response  scale,  from  1  (strongly  disagree)  to  5 
(strongly  agree). 


Strongly  Strongly 


Disagree 

Agree 

1 . 1  think  that  I  would  like  to  use 

1 

A 

c 

this  system  frequently. 

I 

5 

4 

5 

2. 1  found  the  system  unnecessarily 
complex. 

1 

2 

3 

4 

5 

3. 1  thought  the  system  was  easy  to 

1 

2 

3 

4 

5 

use. 

4. 1  think  that  I  would  need  the 

support  of  a  technical  person  to  be 
able  to  use  this  system. 

1 

2 

3 

4 

5 

5. 1  found  the  various  functions  in 

1 

'y 

yi 

C 

this  system  were  well  integrated. 

4 

6. 1  thought  there  was  too  much 
inconsistency  in  this  system. 

1 

2 

3 

4 

5 

7. 1  would  imagine  that  most  people 
would  learn  to  use  this  system  very 
quickly. 

1 

2 

3 

4 

5 

8. 1  found  the  system  very  awkward 
to  use. 

1 

2 

3 

4 

5 

9. 1  felt  very  confident  using  the 
system. 

1 

2 

3 

4 

5 

10. 1  needed  to  learn  a  lot  of  things 
before  I  could  get  going  with  this 
system. 

1 

2 

3 

4 

5 
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Appendix  E.  Attentional  Control  Scale 


This  appendix  appears  in  its  original  form,  without  editorial  change. 
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Attentional  Control  Survey 

For  each  of  the  following  questions,  circle  the  response  that  best  describes  you. 


It  is  very  hard  for  me  to  eoncentrate  on  a 
diffieult  task  when  there  are  noises  around. 

Almost 

Never 

Sometimes 

Often 

Always 

When  I  need  to  concentrate  and  solve  a 
problem,  I  have  trouble  focusing  my 
attention. 

Almost 

Never 

Sometimes 

Often 

Always 

When  I  am  working  hard  on  something,  I 
still  get  distracted  by  events  around  me. 

Almost 

Never 

Sometimes 

Often 

Always 

My  concentration  is  good  even  if  there  is 
music  in  the  room  around  me. 

Almost 

Never 

Sometimes 

Often 

Always 

When  concentrating,  I  can  focus  my  attention 
so  that  I  become  unaware  of  what’s  going  on 
in  the  room  around  me. 

Almost 

Never 

Sometimes 

Often 

Always 

When  I  am  reading  or  studying,  I  am  easily 
distracted  if  there  are  people  talking  in  the 
same  room. 

Almost 

Never 

Sometimes 

Often 

Always 

When  trying  to  focus  my  attention  on 
something,  I  have  difficulty  blocking  out 
distracting  thoughts. 

Almost 

Never 

Sometimes 

Often 

Always 

I  have  a  hard  time  concentrating  when  I’m 
excited  about  something. 

Almost 

Never 

Sometimes 

Often 

Always 

When  concentrating,  I  ignore  feelings  of 
hunger  or  thirst. 

Almost 

Never 

Sometimes 

Often 

Always 

I  can  quickly  switch  from  one  task  to 
another. 

Almost 

Never 

Sometimes 

Often 

Always 

It  takes  me  a  while  to  get  really  involved  in  a 
new  task. 

Almost 

Never 

Sometimes 

Often 

Always 

It  is  difficult  for  me  to  coordinate  my 
attention  between  the  listening  and  writing 
required  when  taking  notes  during  lectures. 

Almost 

Never 

Sometimes 

Often 

Always 

I  can  become  interested  in  a  new  topic  very 
quickly  when  I  need  to. 

Almost 

Never 

Sometimes 

Often 

Always 

It  is  easy  for  me  to  read  or  write  while  I’m 
also  talking  on  the  phone. 

Almost 

Never 

Sometimes 

Often 

Always 

I  have  trouble  carrying  on  two  conversations 
at  once. 

Almost 

Never 

Sometimes 

Often 

Always 

I  have  a  hard  time  coming  up  with  new  ideas 
quickly. 

Almost 

Never 

Sometimes 

Often 

Always 

After  being  interrupted  or  distracted,  I  can 
easily  shift  my  attention  back  to  what  I  was 
doing  before. 

Almost 

Never 

Sometimes 

Often 

Always 

When  a  distracting  thought  comes  to  mind,  it 
is  easy  for  me  to  shift  my  attention  away 
from  it. 

Almost 

Never 

Sometimes 

Often 

Always 
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It  is  easy  for  me  to  alternate  between  two 
different  tasks. 

Almost 

Never 

Sometimes 

Often 

Always 

It  is  hard  for  me  to  break  from  one  way  of 
thinking  about  something  and  look  at  it  from 
another  point  of  view. 

Almost 

Never 

Sometimes 

Often 

Always 

71 


Intentionally  left  blank. 


72 


Appendix  F.  Cube  Comparison  Test 


This  appendix  appears  in  its  original  form,  without  editorial  change. 
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Cube  Comparisous  Test 


Participant  # 


Date 


CUBE  COHPARISONS  TEST  —  S-2  (Rev.; 

Wooden  blocks  such  as  children  play  with  are  often  cubical  wi rh  a  different 
letter,  nuaber,  or  symbol  on  each  of  the  six  faces  Crop,  bottom,  four  aides) • 

Each  problem  lu  this  test  cunaists  uf  dtawin;;s  of  pairs  of  cubes  or  blocks  of 
CKla  kind  «  Roaieeihttr,  there  le  e  different  dcsi-S"*  meuber,  or  lector  on  each  face 
of  a  given  cube  or  block.  Compare  the  two  cubes  in  each  peit  Hainw. 


ss  DMB  C3C3 


The  first  pair  Is  marked  D  because  they  must  be  drawings  of  different  cubes. 

If  the  left  cube  Is  turned  so  Chat  the  A  is  upright  and  facing  you,  the  N  would  be 
to  Che  left  of  the  A  and  hidden,  noC  to  the  right  of  the  A  as  is  shown  on  the  right 
hand  member  of  the  pair.  Thus,  the  drawings  must  be  of  different  cubes. 

The  ttocuvtd  pole  1»  marked  S  beueuee  they  could  be  drawlnge  of  the  «one  cube. 
That  la,  if  the  A  ia  turned  cn  tto  side  Che  X  becotnea  hidden,  the  8  la  now  on  top, 
and  che  C  Cwhlch  wac  hidden)  now  appears.  Thus  rhe  two  drawings  could  be  of  Che 
same  cube. 

Nutes  No  letceire,  nuiubera,  ox  symbols  oppoac  on  mors  cWsn  ono  Cac-o  of  a  glvon 
cube*  Except  for  chat,  any  letter,  number  or  aymbol  can  be  on  the  hidden  faces  of 
a  ctibe. 


Work  Che  three  examples  below. 


SD  oa  sa  oc=  sea  ot= 


The  first  pair  Immediately  above  should  be  marked  D  because  the  X  cannot  be  at 
the  peak  of  the  A  on  the  left  hand  drawing  and  at  the  base  of  the  A  on  the  right 
hand  drawing.  The  second  pair  la  '*d l f ferenr '*  because  f  has  its  side  next  to  G  on 
the  left  tiand  cube  but  Its  cop  next  to  C  on  the  right  hand  cube.  The  blocks  in  the 
xhlrd  pair  ore  tbe  aana ,  the  J  snd  K  ars  Just  turnsd  on  their  side,  moving  the  H  to 
the  cop. 

Your  score  on  this  test  will  be  the  number  marked  correctly  minus  the  nuaber 
marked  incorrectly.  Therefore,  it  will  not  be  to  your  advantage  to  guess  unless  you 
have  some  Idea  which  choice  is  correct.  Work  as  quickly  as  you  can  without  sacrl— 
f icing  accuracy. 

You  will  have  3  minutes  for  each  of  the  two  psrte  of  this  cset.  Each  part  Kaa 
one  page.  When  you  have  finished  Part  1,  STOP. 

UO  NOT  XTJfLN  THE  PAGE  UNTIE  YOU  AJtE  ASKED  TO  DO  SO. 

Copyright  1962,  1976  by  Educational  Testing  Service.  All  rights  reserved. 
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}  itit-gtec 


DO  NOT  00  ON  TO  THE  NEXT  PACE  WJTII.  ASKEi  TO  DO  30.  STOP. 
Copyright  1962,  1976  by  JSducaLkon&l  Testi-ng  Service.  Ail  right*  reserved. 
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Appendix  G.  Spatial  Orientation  Test 


This  appendix  appears  in  its  original  form,  without  editorial  change. 
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Spatial  Orientation  Test 

The  Spatial  Orientation  Test,  modeled  after  the  cardinal  dhection  test  developed  by  Gugerty  and 
his  colleagues  (Gugert)'  &  Brooks.  2004).  is  a  computerized  test  consisting  of  a  biief  trainmg 
segment  and  32  test  questions.  Hie  program  automatically  captures  both  accuracy  and  response 
time.  Paiticipants  ai'e  shown  the  follovvmg  image; 


The  right  side  image  is  of  a  map  showmg  a  plane  flshiig.  The  left  side  of  the  display  is  the  pilot's 
view  (from  the  cockpit  of  the  plane)  of  several  parkmg  lots  simoimdmg  a  building.  The 
participants’  task  is  to  use  the  right  side  of  the  display  to  learn  m  which  duectioii  the  plane  is 
flying.  They  then  use  tliis  mfonnation  to  identify  which  parking  lot  (north,  south,  east,  or  west) 
m  the  left  side  image  has  the  dot.  In  the  example  shown  above,  the  plane  is  heading  north,  and  so 
the  dot  appears  m  the  north  parking  lot.  In  the  example  shown  below,  the  plane  is  heading  south, 
and  so  the  dot  appears  in  the  east  parking  lot. 


Participants  are  shown  32  of  these  miages  m  succession:  each  tune  the  direction  the  plane  is 
flying  and  the  location  of  the  dot  are  randomized.  Participants  answer  by  clicking  on  one  of  foiu' 
buttons  (North.  South.  East,  or  West).  This  test  is  self-paced;  the  participant  may  take  as  long  as 
they  wish  to  answer,  and  when  they  answer  one  question  the  next  question  automatically 
appears.  No  questions  can  be  skipped,  and  the  order  of  images  is  randomized  among  participants. 
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Appendix  H.  Sense  of  Direction  Questionnaire 


This  appendix  appears  in  its  original  form,  without  editorial  change. 
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SDQ-S 


Please  answer  the  following  questions  by  cireling  a  number  on  the  provided 
response  seale,  from  1  (strongly  disagree)  to  5  (strongly  agree). 


Strongly 

Disagree 

Disagree 

Neutral 

Agree 

Strongly 

Agree 

1 . 1  ean  make  eorreet 
ehoiees  as  to  eardinal 

1 

'y 

A 

c 

directions  in  an  unfamiliar 
place. 

I 

5 

4 

5 

2. 1  have  become 
confused,  as  to  cardinal 

1 

'y 

A 

c 

directions,  when  I  am  in 
an  unfamiliar  place. 

I 

5 

4 

5 

3. 1  have  difficulties 
identifying  the  moving 
direction  of  a  train  with 
regard  to  cardinal 
direction. 

1 

2 

3 

4 

5 

4.  When  I  get  route 
information,  I  can  make 
use  of  “left  or  right’  ’ 
information,  but  I  can’t 
use  cardinal  directions. 

1 

2 

3 

4 

5 

5. 1  can’t  make  out  which 
direction  my  room  in  a 
hotel  faces. 

1 

2 

3 

4 

5 

6. 1  can  tell  where  I  am  on 

1 

'X 

A 

a  map. 

‘A 

D 

7. 1  can  visualize  the  route 
as  a  map-like  image. 

1 

2 

3 

4 

5 

8. 1  feel  anxious  about  my 
walking  direction  in  an 
unfamiliar  area. 

1 

2 

3 

4 

5 

9. 1  have  poor  memory  for 
landmarks. 

1 

2 

3 

4 

5 

10. 1  cannot  remember 
landmarks  found  in  the 

1 

'1 

A 

c 

area  where  I  have  often 
been. 

L 

4 

D 

1 1 . 1  can’t  use  landmarks 

1 

'y 

A 

c 

in  wayfinding. 

L 

3 

4 

5 

12. 1  can’t  remember  the 
different  aspects  of 

1 

2 

3 

4 

5 

sceneries. 
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13.1  often  can’t  find  the 
way  even  if  given  detailed 
verbal  information  on  the 


route. 


14. 1  have  a  lot  of 

difficulties  reaching  the  ^ 

unknown  place  even  after 

2 

3 

4 

5 

looking  at  a  map. 

15.1  often  (or  easily) 

forget  which  direction  I  1 

turned. 

2 

3 

4 

5 

16. 1  become  totally 
confused  as  to  the  correct 
sequence  of  the  return  ^ 

way  as  a  consequence  of  a 
number  of  left-right  turns 
in  the  route. 

2 

3 

4 

5 

17. 1  can’t  verify 

landmarks  in  a  turn  of  the  1 

2 

3 

4 

5 

route. 
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Appendix  I.  Personal  Involvement  Measure 


This  appendix  appears  in  its  original  form,  without  editorial  change. 
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Personal  Involvement  Measure 


Reflecting  on  the  experience  with  the  system  you  just  used,  please  rate  the  extent 
to  which  you  agree  with  each  of  these  items  by  circling  a  value  from  1  (strongly 
disagree)  to  7  (strongly  agree),  where  4  is  neutral. 


Strongly 

Disagree 

Neutral 

Strongly 

Agree 

1. 

I  was  uninterested 

1 

7 

A 

C 

7 

in  the  task. 

X 

J 

o 

/ 

2. 

Doing  well  in  the 
task  was  important 

1 

2 

3 

4 

5 

6 

7 

to  me. 

3. 

The  task  was 
trivial. 

1 

2 

3 

4 

5 

6 

7 

4. 

The  task  mattered 

to  me. 

1 

2 

3 

4 

5 

6 

7 

5. 

I  was  motivated  to 

1 

'J 

'X 

A 

C 

7 

do  the  task. 

0 

/ 

6. 

I  was  unconcerned 

with  doing  well  in 
the  task. 

1 

2 

3 

4 

5 

6 

7 

84 


Appendix  J.  Structured  Strategy  Interview  Questions 


This  appendix  appears  in  its  original  form,  without  editorial  change. 
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Structured  Strategy  Interview  Questions 

Part  1. 

Please  answer  the  following  questions  regarding  the  last  set  of  deeisions  you  have 
made.  When  you  answer  only  think  of  the  missions  you  just  eompleted. 

1 .  Can  you  describe  the  overall  process  you  used  to  make  a  decision?  How 
did  you  decide  which  plan  to  choose  between  the  two  plans  that  you  were 
presented  with? 

2.  Overall,  which  parts  of  the  system  or  display  elements  did  you  consider 
when  making  a  decision?  List  all  the  parts  that  you  used. 

3.  Were  there  any  decisions  that  you  relied  more  on  certain  parts  of  the 
system  than  others?  Or  any  that  certain  parts  of  the  system  were  not 
helpful  at  all  to  your  decision  making  process? 

4.  If  you  only  had  one  piece  of  information  from  the  system  to  use  to  solve 
all  of  the  decisions  you  encountered  which  one  would  you  want  to  use? 


Part  2. 

Previously  you  mentioned  several  parts  of  the  system  that  you  used  to  make  a 
decision.  Now  you  are  going  to  rate  each  part  of  the  system  on  a  1-7  scale.  A 
rating  of  a  1  indicates  this  part  was  not  helpful  at  all,  while  a  rating  of  a  7 
indicates  it  was  extremely  helpful  to  your  decision  making  process. 


1 .  Play  name  (what  the  play  was 
called) 

1  2 

3 

4 

5 

6 

7 

2.  Play  details  tile 

1  2 

3 

4 

5 

6 

7 

3.  Vehicle  status  indicator 

1  2 

3 

4 

5 

6 

7 

4.  Information  Bar 

1  2 

3 

4 

5 

6 

7 

5.  Plan  colors  (colors  of  vehicles 

1  2 

3 

4 

5 

6 

7 

and  map  elements) 

6.  Asset  capability  display 

1  2 

3 

4 

5 

6 

7 

7.  Vehicle  Sizes 

1  2 

3 

4 

5 

6 

7 

8.  Map  (locations  of  icons  on  map. 

1  2 

3 

4 

5 

6 

7 

vehicles  etc...) 

9.  Intel  Alerts 

1  2 

3 

4 

5 

6 

7 

10.  Equalizer  Display 

1  2 

3 

4 

5 

6 

7 

11.  Text  Table 

1  2 

3 

4 

5 

6 

7 

12.  Equalizer  display  uncertainty 

1  2 

3 

4 

5 

6 

7 

13.  Table  Uncertainty 

1  2 

3 

4 

5 

6 

7 

14.  Vehicle  Uncertainty 

1  2 

3 

4 

5 

6 

7 

15.  Vehicle  path  uncertainty 

1  2 

3 

4 

5 

6 

7 
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Part  3. 

1 .  In  today's  experiment  you  had  three  different  system  layouts  with  different 
types  of  Information.  Whieh  one  did  you  prefer?  Were  deeisions  easier 
with  one  layout  than  the  others? 

2.  Were  there  any  parts  of  the  system  that  gave  you  consistency,  conflicting 
or  hard  to  understand  Information?  Please  be  as  detailed  as  possible. 

3.  Do  you  have  any  other  comments  for  us  about  the  experiment? 


87 


Intentionally  left  blank. 


88 


List  of  Symbols,  Abbreviations,  and  Acronyms 


3Ps 

purpose,  proeess,  performanee 

AFRL 

US  Air  Eoree  Researeh  Eaboratory 

ANOVA 

analysis  of  varianee 

AVGP 

aetion  video  game  player 

Cl 

eonfidenee  interval 

CR 

eorreet  lA  rejection 

FD 

fixation  duration 

GE 

gaming  experience 

lA 

intelligent  agent 

ID 

individual  difference 

Intel 

intelligence 

ISO 

International  Organization  for  Standardization 

MANOVA 

multivariate  analysis  of  variance 

NASA-TLX 

National  Air  and  Space  Administration  Task  Load  Index 

OSPAN 

Operation  Span 

PAG 

perceived  attentional  control 

PD 

pupil  diameter 

PU 

proper  lA  use 

RT 

response  time 

SA 

situation  awareness 

SAT 

SA-based  agent  transparency 

SDT 

signal  detection  theory 

SEM 

standard  error  of  the  mean 

SMI  RED 

SensoMotoric  Instruments  Remote  Eye -tracking  Device 

SpaO 

spatial  orientation 
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SpaV 

spatial  visualization 

SUS 

System  Usability  Scale 

UAV 

unmanned  aerial  vehicle 

UGV 

unmanned  ground  vehicle 

USV 

unmanned  surface  vehicle 

UxV 

multi-unmanned  vehicle 

WMC 

working  memory  capacity 
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RDRL  HRT  T  R  SOTTILARE 
RDRL  HRT  B  N  FINKELSTEIN 
RDRL  HRT  G  A  RODRIGUEZ 
RDRL  HRT  I  J  HART 
RDRL  HRT  M  C  METEVIER 
RDRL  HRT  S  B  PETTIT 
12423  RESEARCH  PARKWAY 
ORLANDO  FL  32826 
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1  ARMY  RSCH  LAB  -  HRED 
(PDF)  RDRL  HRM  DE  A  MARES 

1733  PLEASONTONRD  BOX  3 
FORT  BLISS  TX  79916-6816 

1  ARMY  RSCH  LAB  -  HRED 
(PDF)  HQ  USASOC 

RDRL  HRM  CN  R  SPENCER 
BLDG  E2929  DESERT  STORM  DR 
FORT  BRAGG  NC  28310 

1  ARMY  G1 
(PDF)  DAPE  MR  B  KNAPP 
300  ARMY  PENTAGON 
RM  2C489 

WASHINGTON  DC  20310-0300 
ABERDEEN  PROVING  GROUND 


17  DIRUSARL 
(PDF)  RDRL  HR 

L  ALLENDER 
P  FRANASZCZUK 
K  MCDOWELL 
RDRL  HRM 

P  SAVAGE-KNEPSHIELD 
RDRL  HRM  AL 
C  PAULILLO 
RDRL  HRM  AR 
J  MERCADO 
RDRL  HRM  AT 
JCHEN 
MRUPP 
RDRL  HRM  AY 
M  BARNES 
RDRL  HRM  B 
J  GRYNO  VICKI 
RDRL  HRM  C 
L  GARRETT 
RDRL  HRS 
J  LOCKETT 
RDRL  HRS  B 
M  LAFIANDRA 
RDRL  HRS  D 
A  SCHARINE 
RDRL  HRS  E 
D  HEADLEY 
RDRL  SL 
D  BAYLOR 
RDRL  SLE 
R  FLORES 
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